Object Recognition with CoreML, Vision and SwiftUI on iOS

Introduction

This blog post, first published in early 2021, has been continually updated to reflect the latest advancements in SwiftUI and related Apple technologies. Should you encounter any issues or have questions, please contact me at [email protected].

WWDC 2021 is only a week away from today (May 31, 2021).

The SwiftUI framework has been released for about 2 years and has gained tremendous momentum in the iOS developer community.

However, it is still in its relatively early stage. One of the most significant issues for AI and machine learning practitioners like myself is that there are no native SwiftUI-based views for Camera related applications/use cases.

In this article, I will show you how to use SwiftUI to wrap up the UIImagePickerController from the UIKit framework and create an app that tells if it is a hotdog or not.

So we can shoot a new photo or pick up an existing image from our photo albums, or take a picture using our built-in cameras on iOS.

The complete code repo can be downloaded on GitHub:

https://github.com/theleonwei/seefood


Step 1: the original SeeFood app

We will make an app called SeeFood. The original idea came from this episode from Silicon Valley. If you have not watched Silicon Valley, here is a quick snippet of what Jing Yang's app is about.

 

As you can see, Jing Yang's app has a fundamental flaw in that it can only tell if an object is a hot dog or not, so Pizza will be classified as a non-hot dog thing. We will extend our app prediction categories to a few thousand everyday items such as different fruits and significantly broaden SeeFood's use cases.

 

Step 2: start the iOS project 

Make sure you are creating an iOS app. 

image recognition  with swiftui: create iOS app | PostureReminderApp.com

 

 

image recognition  with swiftui: create iOS app 2

 

Step 3: Wrap up UIImageView Controller

Let's add an image viewer.

 

In ContentView.swift, let's add two system images to bring up a sheet with the image controller interface when we click on any of them. 

 

struct ContentView: View {

  var body: some View {
      HStack{
        Image(systemName: "photo")
        Image(systemName: "camera")
      }
      .font(.largeTitle)
      .foregroundColor(.blue)
  }
}

 

Next, let's add a placeholder for the image we will feed into the ML model:

 

struct ContentView: View {

  var body: some View {
    VStack{
      HStack{
        Image(systemName: "photo")
        Image(systemName: "camera")
      }
      .font(.largeTitle)
      .foregroundColor(.blue)

      Rectangle()
        .strokeBorder()
        .foregroundColor(.yellow)
    }
    .padding()

  }

}

 

Now let's create our ImagePicker view.

Create a new file: ImagePicker.swift and import both SwiftUI and UIKit

This is what the code looks like, and the ImagePicker class conforms to the UIViewControllerRepresentable protocol.

 

import SwiftUI
import UIKit

struct ImagePicker: UIViewControllerRepresentable {    
}

 

SwiftUI app for image recognition: create a UIViewControllerRepresentable

 

XCode will show an error message, click on the red dot and Fix and replace type to UIImagePickerController.

image recognition  with swiftui: UIViewControllerRepresentable

 

There are still errors. Let's click on the red dot and fix button.

XCode will automatically add two more functions (required by UIViewControllerRepresentable protocol).

image recognition  with swiftui: UIViewControllerRepresentable error

We will not be using the updateUIViewController function so that we can leave its body blank.

Inside the makeUIViewController function, we need to return an UIImagePickerController object. This is where we initialize the UIImagePickerController object, and we can also customize it with a few options, e.g., choosing between a camera view or an image selection view.

Let's first try out its photo library option.

Here is what our code looks like:

import SwiftUI

import UIKit

​​​​​​​struct ImagePicker: UIViewControllerRepresentable {

  func makeUIViewController(context: Context) -> UIImagePickerController {
    let imagePicker = UIImagePickerController()
    imagePicker.sourceType = .photoLibrary
    return imagePicker
  }

   
  func updateUIViewController(_ uiViewController: UIImagePickerController, context: Context) {     
  }
   
  typealias UIViewControllerType = UIImagePickerController
   
}

 

Now let's try out our ImagePicker view, 

inside ContentView.swift, we need to add a temporary variable so the system knows when the imagePicker view is presented.

Add the following:

@State var isPresenting: Bool = false

We need to change its value to launch our image picker when the photo image is clicked.

 Image(systemName: "photo")
    .onTapGesture {
        isPresenting = true
    }

 

And add this immediately after our VStack.

.sheet(isPresented: $isPresenting){
      ImagePicker()
}

 

The code will look like the following:

import SwiftUI

struct ContentView: View {

  @State var isPresenting: Bool = false

  var body: some View {
    VStack{
      HStack{
        Image(systemName: "photo")
          .onTapGesture {
            isPresenting = true
          }         

        Image(systemName: "camera")
      }
      .font(.largeTitle)
      .foregroundColor(.blue)       

      Rectangle()
        .strokeBorder()
        .foregroundColor(.yellow)
    }
    .sheet(isPresented: $isPresenting){
      ImagePicker()
    }
    .padding()

  }

}

 

Hit command + R to run it in a simulator, and you should be able to see something like the following.

 

image recogniction on iPhone with swiftui | PostureReminderApp.com

 

As you can see, we now can bring up our photo library, but it won't do anything after we select a photo yet. Let's fix this.

Inside of our ImagePicker.swift file, we need to add a coordinator class so ImagePickerViewController can notify our ImagePicker struct user has interacted with an image.

Let's put a function called makeCoordinator(), which constructs a coordinator and functions as a delegate for UIImageViewController.

func makeCoordinator() -> Coordinator {    
}

We also need to define our Coordinator class, so add a nested class inside our ImagePicker struct. (We don't necessarily need to nest it inside but for simplicity)

class Coordinator: NSObject, UIImagePickerControllerDelegate, UINavigationControllerDelegate {

}

 

Add the following two functions inside of our Coordinator. 

class Coordinator: NSObject, UIImagePickerControllerDelegate, UINavigationControllerDelegate {    

    func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {       

    }

    
    func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {  

    }    

  }

 

The first function will be used when an image is selected and pass the image data to a delegate.

 

Here is a complete code 

import SwiftUI
import UIKit

struct ImagePicker: UIViewControllerRepresentable {

  @Binding var uiImage: UIImage?     
  @Binding var isPresenting: Bool
   

  func makeUIViewController(context: Context) -> UIImagePickerController {

    let imagePicker = UIImagePickerController()

    imagePicker.sourceType = .photoLibrary

    imagePicker.delegate = context.coordinator

    return imagePicker

  }


  func updateUIViewController(_ uiViewController: UIImagePickerController, context: Context) {

  }

 
  typealias UIViewControllerType = UIImagePickerController
  

  func makeCoordinator() -> Coordinator {

    Coordinator(self)

  }
   

  class Coordinator: NSObject, UIImagePickerControllerDelegate, UINavigationControllerDelegate {

     

    let parent: ImagePicker

         

    func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {

      parent.uiImage = info[.originalImage] as? UIImage

      parent.isPresenting = false

    }

     

    func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {

      parent.isPresenting = false

    }

     

    init(_ imagePicker: ImagePicker) {

      self.parent = imagePicker

    }

     

  }


}

 

and ContentView.swift


 

import SwiftUI

struct ContentView: View {

  @State var isPresenting: Bool = false
  @State var uiImage: UIImage?
      
  var body: some View {
    VStack{
      HStack{
        Image(systemName: "photo")
          .onTapGesture {
            isPresenting = true
          }        

        Image(systemName: "camera")
      }
      .font(.largeTitle)
      .foregroundColor(.blue)

      Rectangle()
        .strokeBorder()
        .foregroundColor(.yellow)
      .overlay(
        Group {
          if uiImage != nil {
            Image(uiImage: uiImage!)
              .resizable()
              .scaledToFit()
          }
        }
      )
    }
    .sheet(isPresented: $isPresenting){
      ImagePicker(uiImage: $uiImage, isPresenting: $isPresenting)       
    }     

    .padding()

  }

}



struct ContentView_Previews: PreviewProvider {

  static var previews: some View {

    ContentView()

  }

}



 

And if we run it in our simulator, here is what it might look like:

object recognition with swiftui simulator

We've successfully hooked up an image in your photo library to a SwiftUI view.

 

Step 4: Configure the MLModel

This is the easiest part.

Let's create a new file to configure and load our model.

1. Download the model file from apple's official website:

https://developer.apple.com/machine-learning/models/

I've tried a few models, and the MobileNetV2 seems to be a small file that performs reasonably well, so let's download this file and copy it over to our project.

You can drag the model file MobilveNetV2.mlmodel to the project folder after it's downloaded.

Create a new file Classifier. swift and  inside the file:

import CoreML
import Vision
import CoreImage

struct Classifier {
    
    private(set) var results: String?
    
    mutating func detect(ciImage: CIImage) {
        
        guard let model = try? VNCoreMLModel(for: MobileNetV2(configuration: MLModelConfiguration()).model)
        else {
            return
        }
        
        let request = VNCoreMLRequest(model: model)
        
        let handler = VNImageRequestHandler(ciImage: ciImage, options: [:])
        
        try? handler.perform([request])
        
        guard let results = request.results as? [VNClassificationObservation] else {
            return
        }
        
        if let firstResult = results.first {
            self.results = firstResult.identifier
        }
        
    }
    
}

 

In summary,  we loaded a pre-trained machine learning model. We created a function detect so that when a CIImage (core image format) was feed into the detect function, we will set our property results as the most likely classification results.

 

Step 5: Making an prediction (inferencing)

To use our classifier, we can add a button to trigger a classification request on our ContenView.swift file:

We will need to initialize a classifier by calling our Classifier struct. 

Beneath the image, we add a button to feed the image to our classifier when it's clicked.


struct ContentView: View {
    @State var isPresenting: Bool = false
    @State var uiImage: UIImage?
    
    var classifier = Classifier()
    
    var body: some View {
        VStack{
            HStack{
                Image(systemName: "photo")
                    .onTapGesture {
                        isPresenting = true
                    }
                
                Image(systemName: "camera")
            }
            .font(.largeTitle)
            .foregroundColor(.blue)
            
            Rectangle()
                .strokeBorder()
                .foregroundColor(.yellow)
                .overlay(
                    Group {
                        if uiImage != nil {
                            Image(uiImage: uiImage!)
                                .resizable()
                                .scaledToFit()
                        }
                    }
                )
            
            Button(action: {
                if uiImage != nil {
                    guard let ciImage = CIImage(image: uiImage!) else {
                        print("cannot convert uiimage to ciimage")
                        return
                    }
                    classifier.detect(ciImage: ciImage)
                }
            }) {
                Image(systemName: "bolt.fill")
                    .foregroundColor(.red)
                    .font(.title)
            }
            
            
            
            
        }
        .sheet(isPresented: $isPresenting){
            ImagePicker(uiImage: $uiImage, isPresenting:  $isPresenting)
            
        }
        
        
        .padding()
    }
}

If you run the simulator and select an image from your album, then click the Bolt button under the image, it will print out the most likely object the model predicts.

 

Step 6. render the classification results to the UI

Let's show the classifier results on the UI directly, and it is straightforward.

So far, we have not separated our code in a nice and clean MVVM way. Let's add a view model as the bridge between our classifier and our UI. 

The view model will pass over the image a user has selected on the UI, feed it into our model, and retrieve the classified results, clean it up (if needed in the future), and finally present it to the view.

1. let's first create a new file and name it ImageClassifier.swift

import SwiftUI

class ImageClassifier: ObservableObject {
    
    @Published private var classifier = Classifier()
    
    var imageClass: String? {
        classifier.results
    }
        
    // MARK: Intent(s)
    func detect(uiImage: UIImage) {
        guard let ciImage = CIImage (image: uiImage) else { return }
        classifier.detect(ciImage: ciImage)
        
    }
        
}

 

2. Now, let's make some changes to our ContentView.swift

import SwiftUI

struct ContentView: View {
    @State var isPresenting: Bool = false
    @State var uiImage: UIImage?
    
    @ObservedObject var classifier: ImageClassifier
    
    var body: some View {
        VStack{
            HStack{
                Image(systemName: "photo")
                    .onTapGesture {
                        isPresenting = true
                        sourceType = .photoLibrary
                    }
                
                Image(systemName: "camera")
            }
            .font(.title)
            .foregroundColor(.blue)
            
            Rectangle()
                .strokeBorder()
                .foregroundColor(.yellow)
                .overlay(
                    Group {
                        if uiImage != nil {
                            Image(uiImage: uiImage!)
                                .resizable()
                                .scaledToFit()
                        }
                    }
                )
            
            
            VStack{
                Button(action: {
                    if uiImage != nil {
                        classifier.detect(uiImage: uiImage!)
                    }
                }) {
                    Image(systemName: "bolt.fill")
                        .foregroundColor(.orange)
                        .font(.title)
                }
                
                
                Group {
                    if let imageClass = classifier.imageClass {
                        HStack{
                            Text("Image categories:")
                                .font(.caption)
                            Text(imageClass)
                                .bold()
                        }
                    } else {
                        HStack{
                            Text("Image categories: NA")
                                .font(.caption)
                        }
                    }
                }
                .font(.subheadline)
                .padding()
                
            }
        }
        
        .sheet(isPresented: $isPresenting){
            ImagePicker(uiImage: $uiImage, isPresenting:  $isPresenting, sourceType: $sourceType)
                .onDisappear{
                    if uiImage != nil {
                        classifier.detect(uiImage: uiImage!)
                    }
                }
            
        }
        
        .padding()
    }
}


struct ContentView_Previews: PreviewProvider {
    static var previews: some View {
        ContentView(classifier: ImageClassifier())
    }
}

 

Image recognition with swiftui on iOS simulator​​​​​​​

 

Step 7: Hook up the device camera to take a live picture

This step is quite straightforward. We need to add another source type so that we will show imagePicker with the sourceType as .camera when the camera button image is tapped.

Here is the code for ContentView.swift

import SwiftUI

struct ContentView: View {
    @State var isPresenting: Bool = false
    @State var uiImage: UIImage?
    @State var sourceType: UIImagePickerController.SourceType = .photoLibrary
    
    @ObservedObject var classifier: ImageClassifier
    
    var body: some View {
        VStack{
            HStack{
                Image(systemName: "photo")
                    .onTapGesture {
                        isPresenting = true
                        sourceType = .photoLibrary
                    }
                
                Image(systemName: "camera")
                    .onTapGesture {
                        isPresenting = true
                        sourceType = .camera
                    }
            }
            .font(.title)
            .foregroundColor(.blue)
            
            Rectangle()
                .strokeBorder()
                .foregroundColor(.yellow)
                .overlay(
                    Group {
                        if uiImage != nil {
                            Image(uiImage: uiImage!)
                                .resizable()
                                .scaledToFit()
                        }
                    }
                )
            
            
            VStack{
                Button(action: {
                    if uiImage != nil {
                        classifier.detect(uiImage: uiImage!)
                    }
                }) {
                    Image(systemName: "bolt.fill")
                        .foregroundColor(.orange)
                        .font(.title)
                }
                
                
                Group {
                    if let imageClass = classifier.imageClass {
                        HStack{
                            Text("Image categories:")
                                .font(.caption)
                            Text(imageClass)
                                .bold()
                        }
                    } else {
                        HStack{
                            Text("Image categories: NA")
                                .font(.caption)
                        }
                    }
                }
                .font(.subheadline)
                .padding()
                
            }
        }
        
        .sheet(isPresented: $isPresenting){
            ImagePicker(uiImage: $uiImage, isPresenting:  $isPresenting, sourceType: $sourceType)
                .onDisappear{
                    if uiImage != nil {
                        classifier.detect(uiImage: uiImage!)
                    }
                }
            
        }
        
        .padding()
    }
}

struct ContentView_Previews: PreviewProvider {
    static var previews: some View {
        ContentView(classifier: ImageClassifier())
    }
}

and ImagePicker.swift

import SwiftUI
import UIKit


struct ImagePicker: UIViewControllerRepresentable {
    
    @Binding var uiImage: UIImage?        
    @Binding var isPresenting: Bool
    @Binding var sourceType: UIImagePickerController.SourceType
    
    func makeUIViewController(context: Context) -> UIImagePickerController {
        let imagePicker = UIImagePickerController()
        imagePicker.sourceType = sourceType
        imagePicker.delegate = context.coordinator
        return imagePicker
    }
    
    func updateUIViewController(_ uiViewController: UIImagePickerController, context: Context) {
    }
    
    typealias UIViewControllerType = UIImagePickerController
        
    
    func makeCoordinator() -> Coordinator {
        Coordinator(self)
    }
    
    
    class Coordinator: NSObject, UIImagePickerControllerDelegate, UINavigationControllerDelegate {
        
        let parent: ImagePicker
                
        func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
            parent.uiImage = info[.originalImage] as? UIImage
            parent.isPresenting = false
        }
        
        func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
            parent.isPresenting = false
        }
        
        init(_ imagePicker: ImagePicker) {
            self.parent = imagePicker
        }
        
    }
    
    
}

2 additional steps required:

1: add a privacy setting so that the user will approve the use of a camera.

SwiftUI app for image recognition: privacy setting

2. Connect to a physical device: an iPhone

Since our app requires a real camera, we need to connect to a physical device, i.e., using a USB cable to connect your computer to an iPhone. After that, we can select the physical device from the simulator and run it.

SwiftUI app for image recognition: connect to a physical device to use its camera

Let's try some hotdogs.

Hotdog food detection, seefood, swiftui, coreml and vision

 

And some pizza!

 

SeeFood - Pizza recognition with SwiftUI, CoreML and Vision

 

Conclusion

We've successfully used the latest SwiftUI and created an iOS app that can recognize objects from an image with reasonable accuracy. We've also expanded SeeFood's capacity from a binary classifier to a multi-class classifier.

Here is a link to my GitHub repository if you are interested in forking or downloading to play with it.

https://github.com/theleonwei/seefood


Enjoyed this article? We've added a newsletter to catch everything going on with generative AI and apple vision pro, subscribe here.