Object Recognition with CoreML, Vision and SwiftUI on iOS
Introduction
The SwiftUI framework has been released for about 2 years and has gained tremendous momentum in the iOS developer community.
However, it is still in its relatively early stage. One of the most significant issues for AI and machine learning practitioners like myself is that there are no native SwiftUI-based views for Camera related applications/use cases.
In this article, I will show you how to use SwiftUI to wrap up the UIImagePickerController from the UIKit framework and create an app that tells if it is a hotdog or not.
So we can shoot a new photo or pick up an existing image from our photo albums, or take a picture using our built-in cameras on iOS.
The complete code repo can be downloaded on GitHub:
https://github.com/theleonwei/seefood
Step 1: the original SeeFood app
We will make an app called SeeFood. The original idea came from this episode from Silicon Valley. If you have not watched Silicon Valley, here is a quick snippet of what Jing Yang's app is about.
As you can see, Jing Yang's app has a fundamental flaw in that it can only tell if an object is a hot dog or not, so Pizza will be classified as a non-hot dog thing. We will extend our app prediction categories to a few thousand everyday items such as different fruits and significantly broaden SeeFood's use cases.
Step 2: start the iOS project
Make sure you are creating an iOS app.
Step 3: Wrap up UIImageView Controller
Let's add an image viewer.
In ContentView.swift, let's add two system images to bring up a sheet with the image controller interface when we click on any of them.
struct ContentView: View {
var body: some View {
HStack{
Image(systemName: "photo")
Image(systemName: "camera")
}
.font(.largeTitle)
.foregroundColor(.blue)
}
}
Next, let's add a placeholder for the image we will feed into the ML model:
struct ContentView: View {
var body: some View {
VStack{
HStack{
Image(systemName: "photo")
Image(systemName: "camera")
}
.font(.largeTitle)
.foregroundColor(.blue)
Rectangle()
.strokeBorder()
.foregroundColor(.yellow)
}
.padding()
}
}
Now let's create our ImagePicker view.
Create a new file: ImagePicker.swift and import both SwiftUI and UIKit
This is what the code looks like, and the ImagePicker class conforms to the UIViewControllerRepresentable protocol.
import SwiftUI
import UIKit
struct ImagePicker: UIViewControllerRepresentable {
}
XCode will show an error message, click on the red dot and Fix and replace type to UIImagePickerController.
There are still errors. Let's click on the red dot and fix button.
XCode will automatically add two more functions (required by UIViewControllerRepresentable protocol).
We will not be using the updateUIViewController function so that we can leave its body blank.
Inside the makeUIViewController function, we need to return an UIImagePickerController object. This is where we initialize the UIImagePickerController object, and we can also customize it with a few options, e.g., choosing between a camera view or an image selection view.
Let's first try out its photo library option.
Here is what our code looks like:
import SwiftUI
import UIKit
struct ImagePicker: UIViewControllerRepresentable {
func makeUIViewController(context: Context) -> UIImagePickerController {
let imagePicker = UIImagePickerController()
imagePicker.sourceType = .photoLibrary
return imagePicker
}
func updateUIViewController(_ uiViewController: UIImagePickerController, context: Context) {
}
typealias UIViewControllerType = UIImagePickerController
}
Now let's try out our ImagePicker view,
inside ContentView.swift, we need to add a temporary variable so the system knows when the imagePicker view is presented.
Add the following:
@State var isPresenting: Bool = false
We need to change its value to launch our image picker when the photo image is clicked.
Image(systemName: "photo")
.onTapGesture {
isPresenting = true
}
And add this immediately after our VStack.
.sheet(isPresented: $isPresenting){
ImagePicker()
}
The code will look like the following:
import SwiftUI
struct ContentView: View {
@State var isPresenting: Bool = false
var body: some View {
VStack{
HStack{
Image(systemName: "photo")
.onTapGesture {
isPresenting = true
}
Image(systemName: "camera")
}
.font(.largeTitle)
.foregroundColor(.blue)
Rectangle()
.strokeBorder()
.foregroundColor(.yellow)
}
.sheet(isPresented: $isPresenting){
ImagePicker()
}
.padding()
}
}
Hit command + R to run it in a simulator, and you should be able to see something like the following.
As you can see, we now can bring up our photo library, but it won't do anything after we select a photo yet. Let's fix this.
Inside of our ImagePicker.swift file, we need to add a coordinator class so ImagePickerViewController can notify our ImagePicker struct user has interacted with an image.
Let's put a function called makeCoordinator(), which constructs a coordinator and functions as a delegate for UIImageViewController.
func makeCoordinator() -> Coordinator {
}
We also need to define our Coordinator class, so add a nested class inside our ImagePicker struct. (We don't necessarily need to nest it inside but for simplicity)
class Coordinator: NSObject, UIImagePickerControllerDelegate, UINavigationControllerDelegate {
}
Add the following two functions inside of our Coordinator.
class Coordinator: NSObject, UIImagePickerControllerDelegate, UINavigationControllerDelegate {
func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
}
func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
}
}
The first function will be used when an image is selected and pass the image data to a delegate.
Here is a complete code
import SwiftUI
import UIKit
struct ImagePicker: UIViewControllerRepresentable {
@Binding var uiImage: UIImage?
@Binding var isPresenting: Bool
func makeUIViewController(context: Context) -> UIImagePickerController {
let imagePicker = UIImagePickerController()
imagePicker.sourceType = .photoLibrary
imagePicker.delegate = context.coordinator
return imagePicker
}
func updateUIViewController(_ uiViewController: UIImagePickerController, context: Context) {
}
typealias UIViewControllerType = UIImagePickerController
func makeCoordinator() -> Coordinator {
Coordinator(self)
}
class Coordinator: NSObject, UIImagePickerControllerDelegate, UINavigationControllerDelegate {
let parent: ImagePicker
func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
parent.uiImage = info[.originalImage] as? UIImage
parent.isPresenting = false
}
func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
parent.isPresenting = false
}
init(_ imagePicker: ImagePicker) {
self.parent = imagePicker
}
}
}
and ContentView.swift
import SwiftUI
struct ContentView: View {
@State var isPresenting: Bool = false
@State var uiImage: UIImage?
var body: some View {
VStack{
HStack{
Image(systemName: "photo")
.onTapGesture {
isPresenting = true
}
Image(systemName: "camera")
}
.font(.largeTitle)
.foregroundColor(.blue)
Rectangle()
.strokeBorder()
.foregroundColor(.yellow)
.overlay(
Group {
if uiImage != nil {
Image(uiImage: uiImage!)
.resizable()
.scaledToFit()
}
}
)
}
.sheet(isPresented: $isPresenting){
ImagePicker(uiImage: $uiImage, isPresenting: $isPresenting)
}
.padding()
}
}
struct ContentView_Previews: PreviewProvider {
static var previews: some View {
ContentView()
}
}
And if we run it in our simulator, here is what it might look like:
We've successfully hooked up an image in your photo library to a SwiftUI view.
Step 4: Configure the MLModel
This is the easiest part.
Let's create a new file to configure and load our model.
1. Download the model file from apple's official website:
https://developer.apple.com/machine-learning/models/
I've tried a few models, and the MobileNetV2 seems to be a small file that performs reasonably well, so let's download this file and copy it over to our project.
You can drag the model file MobilveNetV2.mlmodel to the project folder after it's downloaded.
Create a new file Classifier. swift and inside the file:
import CoreML
import Vision
import CoreImage
struct Classifier {
private(set) var results: String?
mutating func detect(ciImage: CIImage) {
guard let model = try? VNCoreMLModel(for: MobileNetV2(configuration: MLModelConfiguration()).model)
else {
return
}
let request = VNCoreMLRequest(model: model)
let handler = VNImageRequestHandler(ciImage: ciImage, options: [:])
try? handler.perform([request])
guard let results = request.results as? [VNClassificationObservation] else {
return
}
if let firstResult = results.first {
self.results = firstResult.identifier
}
}
}
In summary, we loaded a pre-trained machine learning model. We created a function detect so that when a CIImage (core image format) was feed into the detect function, we will set our property results as the most likely classification results.
Step 5: Making an prediction (inferencing)
To use our classifier, we can add a button to trigger a classification request on our ContenView.swift file:
We will need to initialize a classifier by calling our Classifier struct.
Beneath the image, we add a button to feed the image to our classifier when it's clicked.
struct ContentView: View {
@State var isPresenting: Bool = false
@State var uiImage: UIImage?
var classifier = Classifier()
var body: some View {
VStack{
HStack{
Image(systemName: "photo")
.onTapGesture {
isPresenting = true
}
Image(systemName: "camera")
}
.font(.largeTitle)
.foregroundColor(.blue)
Rectangle()
.strokeBorder()
.foregroundColor(.yellow)
.overlay(
Group {
if uiImage != nil {
Image(uiImage: uiImage!)
.resizable()
.scaledToFit()
}
}
)
Button(action: {
if uiImage != nil {
guard let ciImage = CIImage(image: uiImage!) else {
print("cannot convert uiimage to ciimage")
return
}
classifier.detect(ciImage: ciImage)
}
}) {
Image(systemName: "bolt.fill")
.foregroundColor(.red)
.font(.title)
}
}
.sheet(isPresented: $isPresenting){
ImagePicker(uiImage: $uiImage, isPresenting: $isPresenting)
}
.padding()
}
}
If you run the simulator and select an image from your album, then click the Bolt button under the image, it will print out the most likely object the model predicts.
Step 6. render the classification results to the UI
Let's show the classifier results on the UI directly, and it is straightforward.
So far, we have not separated our code in a nice and clean MVVM way. Let's add a view model as the bridge between our classifier and our UI.
The view model will pass over the image a user has selected on the UI, feed it into our model, and retrieve the classified results, clean it up (if needed in the future), and finally present it to the view.
1. let's first create a new file and name it ImageClassifier.swift
import SwiftUI
class ImageClassifier: ObservableObject {
@Published private var classifier = Classifier()
var imageClass: String? {
classifier.results
}
// MARK: Intent(s)
func detect(uiImage: UIImage) {
guard let ciImage = CIImage (image: uiImage) else { return }
classifier.detect(ciImage: ciImage)
}
}
2. Now, let's make some changes to our ContentView.swift
import SwiftUI
struct ContentView: View {
@State var isPresenting: Bool = false
@State var uiImage: UIImage?
@ObservedObject var classifier: ImageClassifier
var body: some View {
VStack{
HStack{
Image(systemName: "photo")
.onTapGesture {
isPresenting = true
sourceType = .photoLibrary
}
Image(systemName: "camera")
}
.font(.title)
.foregroundColor(.blue)
Rectangle()
.strokeBorder()
.foregroundColor(.yellow)
.overlay(
Group {
if uiImage != nil {
Image(uiImage: uiImage!)
.resizable()
.scaledToFit()
}
}
)
VStack{
Button(action: {
if uiImage != nil {
classifier.detect(uiImage: uiImage!)
}
}) {
Image(systemName: "bolt.fill")
.foregroundColor(.orange)
.font(.title)
}
Group {
if let imageClass = classifier.imageClass {
HStack{
Text("Image categories:")
.font(.caption)
Text(imageClass)
.bold()
}
} else {
HStack{
Text("Image categories: NA")
.font(.caption)
}
}
}
.font(.subheadline)
.padding()
}
}
.sheet(isPresented: $isPresenting){
ImagePicker(uiImage: $uiImage, isPresenting: $isPresenting, sourceType: $sourceType)
.onDisappear{
if uiImage != nil {
classifier.detect(uiImage: uiImage!)
}
}
}
.padding()
}
}
struct ContentView_Previews: PreviewProvider {
static var previews: some View {
ContentView(classifier: ImageClassifier())
}
}
Step 7: Hook up the device camera to take a live picture
This step is quite straightforward. We need to add another source type so that we will show imagePicker with the sourceType as .camera when the camera button image is tapped.
Here is the code for ContentView.swift
import SwiftUI
struct ContentView: View {
@State var isPresenting: Bool = false
@State var uiImage: UIImage?
@State var sourceType: UIImagePickerController.SourceType = .photoLibrary
@ObservedObject var classifier: ImageClassifier
var body: some View {
VStack{
HStack{
Image(systemName: "photo")
.onTapGesture {
isPresenting = true
sourceType = .photoLibrary
}
Image(systemName: "camera")
.onTapGesture {
isPresenting = true
sourceType = .camera
}
}
.font(.title)
.foregroundColor(.blue)
Rectangle()
.strokeBorder()
.foregroundColor(.yellow)
.overlay(
Group {
if uiImage != nil {
Image(uiImage: uiImage!)
.resizable()
.scaledToFit()
}
}
)
VStack{
Button(action: {
if uiImage != nil {
classifier.detect(uiImage: uiImage!)
}
}) {
Image(systemName: "bolt.fill")
.foregroundColor(.orange)
.font(.title)
}
Group {
if let imageClass = classifier.imageClass {
HStack{
Text("Image categories:")
.font(.caption)
Text(imageClass)
.bold()
}
} else {
HStack{
Text("Image categories: NA")
.font(.caption)
}
}
}
.font(.subheadline)
.padding()
}
}
.sheet(isPresented: $isPresenting){
ImagePicker(uiImage: $uiImage, isPresenting: $isPresenting, sourceType: $sourceType)
.onDisappear{
if uiImage != nil {
classifier.detect(uiImage: uiImage!)
}
}
}
.padding()
}
}
struct ContentView_Previews: PreviewProvider {
static var previews: some View {
ContentView(classifier: ImageClassifier())
}
}
and ImagePicker.swift
import SwiftUI
import UIKit
struct ImagePicker: UIViewControllerRepresentable {
@Binding var uiImage: UIImage?
@Binding var isPresenting: Bool
@Binding var sourceType: UIImagePickerController.SourceType
func makeUIViewController(context: Context) -> UIImagePickerController {
let imagePicker = UIImagePickerController()
imagePicker.sourceType = sourceType
imagePicker.delegate = context.coordinator
return imagePicker
}
func updateUIViewController(_ uiViewController: UIImagePickerController, context: Context) {
}
typealias UIViewControllerType = UIImagePickerController
func makeCoordinator() -> Coordinator {
Coordinator(self)
}
class Coordinator: NSObject, UIImagePickerControllerDelegate, UINavigationControllerDelegate {
let parent: ImagePicker
func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
parent.uiImage = info[.originalImage] as? UIImage
parent.isPresenting = false
}
func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
parent.isPresenting = false
}
init(_ imagePicker: ImagePicker) {
self.parent = imagePicker
}
}
}
2 additional steps required:
1: add a privacy setting so that the user will approve the use of a camera.
2. Connect to a physical device: an iPhone
Since our app requires a real camera, we need to connect to a physical device, i.e., using a USB cable to connect your computer to an iPhone. After that, we can select the physical device from the simulator and run it.
Let's try some hotdogs.
And some pizza!
Conclusion
We've successfully used the latest SwiftUI and created an iOS app that can recognize objects from an image with reasonable accuracy. We've also expanded SeeFood's capacity from a binary classifier to a multi-class classifier.
Here is a link to my GitHub repository if you are interested in forking or downloading to play with it.
https://github.com/theleonwei/seefood