Landmark recognition is something that may not be applicable to every application, but when it is required it could be a tricky feature to implement — there’s a lot of different things to think about and analyse in these situations. Whilst we may have the location of the device available to us, analysing the structure in an image can still be a difficult task. However, the aim of this functionality in Firebase MLKit is to make this process much simpler for us as developers. Using the Landmark recognition feature we can pass an instance of an image to the Firebase MLKit vision reference and be returned with data around this recognition to use in our application.
Why would we want to do this? Landmark recognition has a number of uses. For example, maybe we do image tagging in our application and want to smoothen this process by tagging the landmarks / locations in these images. Maybe our imaging application makes use of image meta data and we want to extend on this data by providing landmark data for use here. Or maybe content from our app is shared, messages are sent or video calls are made and we want to personalise these experiences further by providing this extra data within these experiences. I am sure there are many other use cases for this functionality which maybe you will discover after playing with the API for yourself 🙂
Note: It’s important to know that MLKit landmark recognition is only available as a cloud feature, meaning that offline recognition cannot be performed. This is because the Google Cloud Vision API is required to perform the recognition process.
Before we can start using the landmark recognition feature of MLKit, we need to begin by adding the dependency to our project level build.gradle file:
implementation 'com.google.firebase:firebase-ml-vision:16.0.0'
Now, the next thing we need to do is to enable the Cloud Vision API as this is required to be able to use this recognition feature. First of all we need to upgrade our Firebase project to the Blaze plan, this is because the cloud recognition API is only available on paid plans. Blaze is a pay as you go plan, so if you want to try the API out then you can always just downgrade the plan once you’re done trying it out.
Note: You can use the recognition process up to 1000 times without being charged.
Next we need to enable the Cloud Vision API for our Firebase project over in the Google API Console.
Now that this is done we have access to use the Cloud Vision API in our Firebase project. You must do this before you can make use of landmark recognition, until you do so the detectInImage() function will return an error stating that you need to enable billing in your firebase project.
When it comes to this recognition, the recogniser will use a default recognition model (known as STABLE) which will return us the top 10 results for the process. However, we can customise this by providing an FirebaseVisionCloudDetectorOptions instance to use in the recognition process:
val options = FirebaseVisionCloudDetectorOptions.Builder() .setModelType(FirebaseVisionCloudDetectorOptions.LATEST_MODEL) .setMaxResults(15) .build()
This way we can customise the maximum number of results that we wish to receive when we carry out the recognition on our image. You may wish to minimise the results returned via the request, or enable a larger amount to be returned so that you have a wide range of results to analyse.
Now that we have our options built, we can go ahead and make use of them in our recognition flow. We want to use these options to create an instance of a FirebaseVisionImage — this is a class which holds our image data ready for the recognition process. Now, we need this instance before we can perform any form of recognition and in order to create an instance of this we need to use our image data — this can be done in one of five ways:
Bitmap
To begin with, we can create this instance of a FirebaseVisionImage using an instance of a Bitmap. We can do so by passing an upright bitmap into the fromBitmap() function — this will give us back a FirebaseVisionImage
val image = FirebaseVisionImage.fromBitmap(bitmap);
media.Image
We can also do so using a media.Image instance — this may be when capturing an image from the devices camera. When doing so we must pass the instance of this image as well as the rotation of it, so this must be calculated prior to calling the fromMediaImage() function.
val image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
ByteBuffer
An instance can also be created using a ByteBuffer. To do so though we must first create an instance of a FirebaseVisionImageMetadata. This contains the data required to construct the vision image, such as the rotation and measurements.
FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder() .setWidth(1280) .setHeight(720) .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21) .setRotation(rotation) .build();
We can then pass this along with our ByteBuffer to create the instance:
val image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);
ByteArray
Creating an image from a ByteArray behaves in the same way as a ByteBuffer except we must using the fromByteArray() function instead:
val image = FirebaseVisionImage.fromByteArray(byteArray, metadata);
File
A vision image instance can be created from a file by calling the fromFilePath() function with a context and desired URI.
val image: FirebaseVisionImage? try { image = FirebaseVisionImage.fromFilePath(context, uri); } catch (IOException e) { e.printStackTrace(); }
Now that we have our image instance which we wish to analyse, we are ready to perform landmark recogntion. And whether or not we are providing an instance of the options, we can prepare our FirebaseVision detector. If we are using the default options we can simply retrieve the instance as it is:
val detector = FirebaseVision.getInstance() .visionCloudLandmarkDetector
Otherwise we can pass in our options reference when retrieving the detector:
val detector = FirebaseVision.getInstance() .getVisionCloudLandmarkDetector(options)
Next, we simply need to call the detectInImage() function on our detector instance, passing in the reference to our image that we previously prepared:
detector.detectInImage(image) .addOnSuccessListener { // Task succeeded! for (landmark in it) { // Do something with landmark } } .addOnFailureListener { // Task failed with an exception }
Now, if this call succeeds then we will be given a list of FirebaseVisionCloudLandmark instances. If no landmarks have been detected then this will be empty, so you need to handle this if this situation occurs. Otherwise, we have access to a collection of landmarks that we now need to do something with. For each FirebaseVisionCloudLandmark instance, we have access to a collection of properties that we can use here:
- getBoundingBox() — Retrieve the region of the image which contains the recognised landmark
val bounds = landmark.boundingBox
- getLandmark() — Retrieve the name of the detected landmark
val landmarkName = landmark.landmark
- getConfidence() — Retrieve the confidence that the given result matches the provided image
val confidence = landmark.confidence
- getLocations() — Retrieve a list of FirebaseVisionLatLng instances which represents locations such as the location of the landmark and the location of where the photo was taken.
val locations = landmark.locations
- getEntityId() — Retrieve the given ID for the recognition process
val entityId = landmark.entityId
If we then run a few images through this recognition flow, we can see the kind of results we get from some well known landmarks:
It looks like MLKit is pretty confident on it’s result there. I’ve used the bounding box to draw around the recognised landmark and then simply shown the landmarkName and confidence properties beneath. Let’s try running a picture of Big Ben through the recogniser:
Not so confident on this one, we can also see that there are multiple instances of the landmark that has been detected. Because I am looping through the landmarks, the last instance of the landmark has been used to display the confidence. Let’s try this again but with just the first result:
A little better here, it’s also easier just having the single bounding box here. What you would likely do here is take all of the results from the recognised landmark and manipulate the collection of confidences to make the decision on the confidence within your app for this landmark.
Finally, let’s run another image through this:
I wanted to run this one through so that we could test out and see if MLKit would still recognise this famous landmark if we only showed a partial image:
Now I know this landmark is pretty well known so that is probably why there is still a high confidence with a partial image, but it’s great to see that it still has the capability to recognise the landmark in this state (and it’s actually more confident that the landmark is the Eiffel tower!).
I hope from this article you’ve been able to learn about what Firebase MLKit landmark recognition is and how we can set it up to utilise it within our applications. It’s definitely worth giving the landmark recognition a trial in your application if this is a feature that you can (or do) make use of. It’s unfortunate that this requires a paid account of Firebase, but depending on the use case that might not be an issue for some.
I’d love to hear how you get on with this API — if you have any thoughts, questions or comments then please do get in touch 🙂