Exploring Firebase MLKit on Android: Face Detection (Part Two)

At Google I/O this year we saw the introduction of Firebase MLKit, a part of the Firebase suite that intends to give our apps the ability to support intelligent features with more ease. With this comes the face recognition feature, giving us the ability to recognise faces along with the ‘landmarks’ (nose, eyes etc) and expressions of those faces. In this post I want to dive into how we can implement this feature into our applications.


There are many case in our apps where we may want to utilise the detection of faces. Maybe you want to perform facial verification, tag photos or add filters to a camera feed — there are a lot of possibilities. Now that this functionality is available as part of the MLKit library, adding this to our apps comes without a lot of the hurdles required to do so. When it comes to facial recognition, MLKit allows us to detect the following features:

  • Coordinates of the base of the nose
  • Coordinates of the right-hand side of the mouth
  • Coordinates of the left-hand side of the mouth
  • Coordinates of the bottom of the mouth
  • Probability that the face is smiling
  • Probability that the right eye is open
  • Probability that the left eye is open
  • Bounds of the detected face
  • Rotating angle of the detected face
  • Tilt angle of the detected face

Wow, that’s a pretty nice list of data that we can retrieve for a detected face! Now, it’s important to note that face recognition from MLKit is only available as on-device recognition — you cannot perform face recognition in the cloud. This is probably a good thing for your users in terms of their personal data, but is also great as it reduces the complexity when it comes to implementing it.


Face recognition with MLKit consists of three core concepts:

To begin with, Face Tracking is the concept of tracking a face based off of its features. If a single face is recognised, then it can be again recognised across images (or video frames even). This itself if not facial recognition, but a concept within recognition that allows us to track faces throughout some process of media.

There are also different Landmark points that make up a face. These are things such as a single eye (be it left or right), a right cheek, the left-hand side of a mouth — all of these are landmarks on a recognised face, which can all be recognised using MLKit. The landmarks that can be detected depend on the Euler Y angle of the face (a positive Y angle is when the face is turned to the left, and a negative Y when the face is turned to the right). Here’s what we can detect based on these values:

You can see here that as the head is turned from one side to the other, the features that will be available for detection will vary. This is worth bearing in mind — the great thing is that the Euler angles are accessible from recognised faces, so it will be possible to prompt the user to turn their head in the desired direction to improve the recognition process. Note: These Euler angles will only be available when the detector is being used in accurate mode.

And finally, Classification is the concept of analysing whether or not a particular characteristic is present within a face. For example, in the case of MLKit we can check the probability that the face is smiling. This classification process will only work for frontal faces — meaning ones with a small Euler Y angle.


Before we can start using the face recognition feature of MLKit, we need to begin by adding the dependency to our project level build.gradle file:

implementation 'com.google.firebase:firebase-ml-vision:16.0.0'

Now, if you want the face recognition part of MLKit to be downloaded at the point of application install, then you can add the following snippet within the application tags of your manifest file. Otherwise, the face recognition part of the MLKit library will be downloaded at the point where it is required within your application.

<meta-data
      android:name="com.google.firebase.ml.vision.DEPENDENCIES"
      android:value="face" />

Now we‘ve done the above, we’re ready to add face recognition to our application. When it comes to facial recognition, there are a few different options that we can configure for the recognition process. This is done via the means of a FirebaseVisionFaceDetectorOptions instance. We can create a new instance of this using the class builder:

val options = FirebaseVisionFaceDetectorOptions.Builder()

And then this can be configured with a collection of different properties:

  • Detection mode—used to state whether the recognition process should favour either speed or accuracy, can be set to either ACCURATE_MODE or FAST_MODE. This defaults to FAST_MODE.
.setModeType(FirebaseVisionFaceDetectorOptions.ACCURATE_MODE)
.setModeType(FirebaseVisionFaceDetectorOptions.FAST_MODE)
  • Landmark detection — used to declare whether the recognition process should recognise facial landmarks such as the nose, eyes, mouth etc. This defaults to NO_LANDMARKS.
.setLandmarkType(FirebaseVisionFaceDetectorOptions.ALL_LANDMARKS)
.setLandmarkType(FirebaseVisionFaceDetectorOptions.NO_LANDMARKS)
  • Feature classification — used to declare whether the recognition process should classify facial features such as whether the face is smiling or the eyes are open. This defaults to NO_CLASSIFICATIONS.
.setClassificationType(FirebaseVisionFaceDetectorOptions.ALL_CLASSIFICATIONS)
.setClassificationType(FirebaseVisionFaceDetectorOptions.NO_CLASSIFICATIONS)
  • Minimum face size — used to define the minimum size of a face (relative to the given image) for it to be detected. This value defaults to 0.1f.
.setMinFaceSize(0.15f)
  • Enable face tracking — used to declare whether or not an ID should be assigned to faces, for tracking faces between images. This defaults to false.
.setTrackingEnabled(true)
.setTrackingEnabled(false)

With these all put together, you’ll have something along the lines of this:

val options = FirebaseVisionFaceDetectorOptions.Builder()
        .setModeType(FirebaseVisionFaceDetectorOptions.FAST_MODE)
        .setLandmarkType(
            FirebaseVisionFaceDetectorOptions.ALL_LANDMARKS)      
        .setClassificationType(
            FirebaseVisionFaceDetectorOptions.ALL_CLASSIFICATIONS)
        .setMinFaceSize(0.15f)
        .setTrackingEnabled(true)
        .build()

If you don’t set any of the options with the builder then they will just be set to their default values that are stated above.


Now that we have our options built, we can go ahead and make use of them in our recognition flow. We want to use these options to create an instance of a FirebaseVisionImage — this is a class which holds our image data ready for the recognition process. Now, we need this instance before we can perform any form of recognition and in order to create an instance of this we need to use our image data — this can be done in one of five ways:

Bitmap

To begin with, we can create this instance of a FirebaseVisionImage using an instance of a Bitmap. We can do so by passing an upright bitmap into the fromBitmap() function — this will give us back a FirebaseVisionImage

val image = FirebaseVisionImage.fromBitmap(bitmap);

media.Image

We can also do so using a media.Image instance — this may be when capturing an image from the devices camera. When doing so we must pass the instance of this image as well as the rotation of it, so this must be calculated prior to calling the fromMediaImage() function.

val image = FirebaseVisionImage.fromMediaImage(mediaImage,    
                rotation);

ByteBuffer

An instance can also be created using a ByteBuffer. To do so though we must first create an instance of a FirebaseVisionImageMetadata. This contains the data required to construct the vision image, such as the rotation and measurements.

FirebaseVisionImageMetadata metadata = new 
    FirebaseVisionImageMetadata.Builder()
        .setWidth(1280)
        .setHeight(720)
        .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
        .setRotation(rotation)
        .build();

We can then pass this along with our ByteBuffer to create the instance:

val image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);

ByteArray

Creating an image from a ByteArray behaves in the same way as a ByteBuffer except we must using the fromByteArray() function instead:

val image = FirebaseVisionImage.fromByteArray(byteArray, metadata);

File

A vision image instance can be created from a file by calling the fromFilePath() function with a context and desired URI.

val image: FirebaseVisionImage?
try {
    image = FirebaseVisionImage.fromFilePath(context, uri);
} catch (IOException e) {
    e.printStackTrace();
}

The approach which you use to retrieve the FirebaseVisionImage instance will depend on your application and how you are working with images. However you do so, at this point you should have access to an instance of the FirebaseVisionImage class. What we need to do next is retrieve an instance of the FirebaseVisionFaceDetector class — this class is used to find any instances of a FirebaseVisionFace within our given image.

val detector = FirebaseVision.getInstance()
        .getVisionFaceDetector(options);

Now we can either call the getVisionFaceDetector() function without any options and the class will use the default options. Or we can pass in the FirebaseVisionFaceDetectorOptions instance that we previously created:

val options = FirebaseVisionFaceDetectorOptions.Builder()
        .setModeType(FirebaseVisionFaceDetectorOptions.FAST_MODE)
        .setLandmarkType(
            FirebaseVisionFaceDetectorOptions.ALL_LANDMARKS)      
        .setClassificationType(
            FirebaseVisionFaceDetectorOptions.ALL_CLASSIFICATIONS)
        .setMinFaceSize(0.15f)
        .setTrackingEnabled(true)
        .build()

Now that our detector is configured and ready to go, we can go ahead and use this FirebaseVisionFaceDetector instance to detect the faces in our image. This can be done by calling the detectInImage() function, passing in our FirebaseVisionImage instance:

val result = detector.detectInImage(image)
        .addOnSuccessListener {

}
.addOnFailureListener {
        }

Within the addOnFailureListener we’ll want to handle the error in face detection and let the user know that we couldn’t complete the operation. On the other hand, addOnSuccessListener we’ll have access to the data that is available from what we requested.

Within this listener we will receive a List<FirebaseVisionFace> instance — we can loop through this list to retrieve the data for each face that was detected. Here’s an image that I run through the face detection process and the data that I retrieved back from it:

Note: This image doesn’t demonstrate all recognition features.

We can see a collection of different data here, so let’s break it down:

FirebaseVisionFace.boundingBox

With our face instance we get a bounding box which represents the bounds of the detected face. This is in the form of a Rect instance, so can be used to easily draw the box onto the canvas.

val bounds = face.boundingBox

We can also retrieve the rotations of the face based on whether the face is tilted at all — if so, we could use these to alter the way in which we handle the other properties given to us by the FirebaseVisionFace instance.

val rotationY = face.headEulerAngleY
val rotationZ = face.headEulerAngleZ

FirebaseVisionFace.leftEyeOpenProbability

We can retrieve the probability the the face has its left eye open using the leftEyeOpenProbability property on our face instance. We must first check that the property was not un-computed using the FirebaseVisionFace.UNCOMPUTED_PROBABILITY field, then we can retrieve the probability value:

if (face.leftEyeOpenProbability !=
        FirebaseVisionFace.UNCOMPUTED_PROBABILITY) {
    val leftEyeOpenProb = face.leftEyeOpenProbability
}

The value returned here will be a probability between 0.0 and 1.0.

FirebaseVisionFace.rightEyeOpenProbability

We can retrieve the probability the the face has its right eye open using the rightEyeOpenProbability property on our face instance. We must first check that the property was not un-computed using the FirebaseVisionFace.UNCOMPUTED_PROBABILITY field, then we can retrieve the probability value:

if (face.rightEyeOpenProbability !=
        FirebaseVisionFace.UNCOMPUTED_PROBABILITY) {
    val rightEyeOpenProb = face.rightEyeOpenProbability
}

The value returned here will be a probability between 0.0 and 1.0.

FirebaseVisionFace.smilingProbability

We can retrieve the probability the the face is smiling using the smilingProbability property on our face instance. We must first check that the property was not un-computed using the FirebaseVisionFace.UNCOMPUTED_PROBABILITY field, then we can retrieve the probability value:

if (face.smilingProbability !=   
        FirebaseVisionFace.UNCOMPUTED_PROBABILITY) {
    val smileProb = face.smilingProbability
}

The value returned here will be a probability between 0.0 and 1.0.

FirebaseVisionFaceLandmark.LEFT_MOUTH

We can retrieve the position for the left-hand side of the mouth by retrieving the FirebaseVisionFaceLandmark.LEFT_MOUTH landmark from the vision face instance.

val leftMouth = face.getLandmark(    
        FirebaseVisionFaceLandmark.LEFT_MOUTH)
leftMouth?.let {
    val leftMouthPos = leftMouth.position
}

FirebaseVisionFaceLandmark.RIGHT_MOUTH

We can retrieve the position for the right-hand side of the mouth by retrieving the FirebaseVisionFaceLandmark.RIGHT_MOUTH landmark from the vision face instance.

val rightMouth = face.getLandmark(    
        FirebaseVisionFaceLandmark.RIGHT_MOUTH)
rightMouth?.let {
    val rightMouthPos = rightMouth.position
}

FirebaseVisionFaceLandmark.BOTTOM_MOUTH

We can also retrieve the position for the bottom of the mouth by retrieving the FirebaseVisionFaceLandmark.BOTTOM_MOUTH landmark from the vision face instance.

val bottomMouth = face.getLandmark(    
        FirebaseVisionFaceLandmark.BOTTOM_MOUTH)
bottomMouth?.let {
    val bottomMouthPos = bottomMouth.position
}

FirebaseVisionFaceLandmark.LEFT_EAR

We can retrieve the position for the left ear by retrieving the FirebaseVisionFaceLandmark.LEFT_EAR landmark from the vision face instance.

val leftEar = face.getLandmark(    
        FirebaseVisionFaceLandmark.LEFT_EAR)
leftEar?.let {
    val leftEarPos = leftEar.position
}

FirebaseVisionFaceLandmark.RIGHT_EAR

We can retrieve the position for the right ear by retrieving the FirebaseVisionFaceLandmark.RIGHT_EAR landmark from the vision face instance.

val rightEar = face.getLandmark(    
        FirebaseVisionFaceLandmark.RIGHT_EAR)
rightEar?.let {
    val rightEarPos = rightEar.position
}

FirebaseVisionFaceLandmark.LEFT_CHEEK

We can retrieve the position for the left cheek by retrieving the FirebaseVisionFaceLandmark.LEFT_CHEEK landmark from the vision face instance.

val leftCheek = face.getLandmark(    
        FirebaseVisionFaceLandmark.LEFT_CHEEK)
leftCheek?.let {
    val leftCheekPos = leftCheek.position
}

FirebaseVisionFaceLandmark.RIGHT_CHEEK

We can retrieve the position for the right cheek by retrieving the FirebaseVisionFaceLandmark.RIGHT_CHEEK landmark from the vision face instance.

val rightCheek = face.getLandmark(    
        FirebaseVisionFaceLandmark.RIGHT_CHEEK)
rightCheek?.let {
    val rightCheekPos = rightCheek.position
}

FirebaseVisionFaceLandmark.NOSE_BASE

We can also retrieve the position for the base of the nose by retrieving the FirebaseVisionFaceLandmark.NOSE_BASE landmark from the vision face instance.

val noseBase = face.getLandmark(    
        FirebaseVisionFaceLandmark.NOSE_BASE)
noseBase?.let {
    val noseBasePos = noseBase.position
}

If we decided to enable face tracking, then we may want to retrieve the ID for the detected face — we can use the trackingId of the face instance to do so. We can first check that this doesn’t equal the INVALID_ID value (this is the value returned if we aren’t tracking faces) and then retrieve the actual value:

if (face.trackingId != FirebaseVisionFace.INVALID_ID) {
    val faceId = face.trackingId
}

Now, what happens if we have multiple faces in the image that we are performing the vision analysis on? Like I previously mentioned, we get back a List<FirebaseVisionFace> instance — this means that we can simply loop through these faces and perform the desired operations for each of those faces. Here’s an image I ran through the process that contained multiple faces:

You can see here that most of the faces were detected, other than the one person looking down at the laptop. We can also see that not all features were detected — I ran the exact same code as the first image, but this probably has something to do with the angles and focus of the faces in the photo (which I’d say is an expected behaviour!).

Now that there are multiple faces in play we can easily handle the different properties of each as we get back the individual Vision Face instances. If we are tracking faces by ID, then this management of faces becomes even simpler also.


It’s important to note that when you’re done with the recognition process you should call close() on the FirebaseVisionFaceDetector instance to release its resources.


Wow, Face Recognition with MLKit is pretty neat isn’t it! From this article I hope you have been able to see just how much simpler Firebase have made this recognition for our applications. What are your thoughts? If you have any comments or questions, please do reach out 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *