Build Facial Recognition Model using TensorFlow & Machine Learning

Building and deploying machine learning models has never been easier

Dhananjay Trivedi
11 min readAug 7, 2019
Banner Background from Air Liquide

What is Machine Learning?

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves. When the computer/machine-learning-model learns, It forms ‘Inference Rules’ by finding out common patterns in the input to reach out to the desired output.

You can assume a machine learning model as a black-box, you give it an input and the desired output. The black-box itself will form its own understanding/rules so that when you give it a similar input in the future, it infers out a similar desired output. This is how ‘intelligence’ is built into the computer.

How does a machine learn?

A machine learning model is made of up of nodes which are similar to Neurons in our human brains. These neurons are structured as layers. There is an Input Layer, Hidden Layer, and Output Layer.

The Input layer takes the input, pre-processes it for the next layers and sends it to the hidden layer.

The hidden layer itself can have multiple layers within itself which do the inferencing/processing of the input to get to output. There is some ‘weight’ associated with each node of the model (just like Neurons in our brain). These weights are tuned while the model is being trained until we get the desired accuracy in the output.

The output layer gets the inferred output from the Hidden layer and gives the output in the desired format.

What is TensorFlow?

TensorFlow is a multipurpose machine learning framework. TensorFlow can be used anywhere from training huge models across clusters in the cloud to running models locally on an embedded system like your phone/IoT devices.

Machine Learning has been here for a while, there are a lot of open-source libraries like TensorFlow where you can find a lot of pre-trained models and build cool stuff on top of them, without starting from Scratch. What we are trying to achieve here falls under Image Classification, where our Machine learning model has to classify the faces in the images amongst the recognized people.

We will use ‘Transfer Learning’ — Wait, What is that?

In many cases, we just need to find one of the models which does a similar task, let’s say of recognising celebrities (which is also a type of Image Classification) and we can just retrain that model with our data. This is called Transfer Learning.

In ‘Transfer Learning’ you just retrain the last layer of CNN (Convolutional Neural Network) of the model with your training data.

As the beginning and intermediate layers, there is a lot of redundant (bottleneck) work going on which you don’t have to perform, again and again. This work has been done when these models were made and trained, this will help us save some time and computation power. Thanks to the good people!

Finding the Model

There is a Github repo called TensorFlow Zoo, where you can find the models. There are some factors involved which you should consider while choosing your model, most importantly, the speed is in milliseconds and the accuracy. If you are trying to build something which works in realtime, like in a live Camera Stream then we need the speed otherwise it would be a bad user experience as each frame will be processed. There is an obvious tradeoff between speed and accuracy, so this is one of the things you should look out for while choosing your model.

Hence, the models we see here are such models whose starting layers of CNN have been trained with a large amount of data (around 1.4 million images), hence while training with our data set at least we don’t need millions of images to work with.

Preparing the Image Dataset

We will be building our facial recognition model using Keras (A Python library) and MobileNetV2 (a model built by Google). Training a model of your own requires a good amount of diverse data for training.

Just to make you aware of, In one of the Google Colabs example where they are classifying flowers, they are using at least 600 images for each flower to train the model. The model will work even for 50 photographs, but won’t be very accurate. To improve accuracy, you need more and more ‘diverse’ photographs.

80% of your data is used for training the model, the rest 20% is used for validation/testing purpose.

Creating & Training the Model

~ In 7 simple steps. Link to the full script is given in the bottom.

We will be creating a model using Keras and MobileNetV2. We will be explaining the steps to make it as simple as possible but it still requires some understanding of Neural Network / Soft Computing.

Created 3 folders inside our directory, and named them as the ‘ID’ for the 3 faces.

We are building a facial recognition system. So for a start, we have 3 people. For each person, We create a folder and move all his/her images inside that folder. We have around 80 images per person. Rename the folder to that particular person name or ID, it’s up to you (This name will be the desired output for those images).

Also, We will be using Pycharm IDE by Jetbrains, feel free to use the environment you feel comfortable in.

P.S. You will need to install some package/dependencies like TensorFlow and Numpy as you go. To install those packages this is how you do it.

pip install package_nameExample:pip install tensorflow
pip install tensorflow-hub
pip install numpy

Alright, let’s look into the ModelTraining.py script.

#1 Importing Dependencies

Just install the dependencies using the above command in the terminal. These are the dependencies we need to import.

#2 Preparing Data and Generators

  1. Here we first define our base directory which contains all the training images.
  2. We are trying to minimize the resolution of the images without losing the ‘Features’. For facial recognition, Image size of minimum ‘224’ seems to work, you can increase or decrease it depending upon your requirements.
  3. Depending upon the total number of images you have to set the batch size. I have 50 images per person (which still won’t give very accurate result). Hence, I am setting my batch size to 5. So, in 10 epochs/iterations batch size of 5 will be processed and trained.
  4. We need a data generator which rescales the images as a part of pre-processing the images like feature extraction and other required operations for the next steps.
  5. We separate data set into Training, Validation & Testing. Mostly you will see Training and Validation. We need generators for that, here we have train and validation generator.
  6. In the for-loop at last, We are triggering a training generator for all the batches.

#3 Preparing Labels

  1. Here we print out all the class indices, these are all the unique output names we want for each person. If you have ’n’ people’s images to train, you will see ’n’ names/uniqueIDs printed out in the console This is just to log the desired output to the console we are training our model for.
  2. There will be a file which has to be generated called ‘labels.txt’ which will contain a list of labels/indices we printed above. This file will be used along with our trained model for inferencing output in the future.
  3. We are just writing out the labels to a file. This is how we do it in python.

#4 Creating a Base-Model from MobileNetV2

We have to create the base model from the pre-trained CNN model MobileNetV2. We will be training this base model with our training data.

This model was developed at Google and was pre-trained on the ImageNet dataset, a large dataset of 1.4M images and 1000 classes of web images. This training of 1.4 million images helped build input layer and some starting layers of Hidden layer which are responsible for Feature extraction from an image which is a redundant bottleneck work which we don’t need to train the model again. Hence we will just train our model’s layers (some part of the hidden layer and output layer) which are responsible for classification.

Quoting Google Colabs example, “First, we have to pick which intermediate layer of MobileNetV2 are used for feature extraction. A common practice is to use the output of the very last layer before the flatten operation, The so-called “bottleneck layer”. The reasoning here is that the following fully-connected layers will be too specialized to the task the network was trained on, and thus the features learned by these layers won’t be very useful for a new task. However, retain many generalities. Let’s instantiate a MobileNet V2 model pre-loaded with weights trained on ImageNet. By specifying the `include_top=False` argument, we load a network that doesn’t include the classification layers at the top, which is ideal for feature extraction.”

  1. We define IMG_SHAPE for resolution of images (Width of Image, Height of Imag, Array of size 3 to accommodate RGB Colors of Colored Image)
  2. Create a base model from the pre-trained model MobileNet V2. We are defining one parameter called ‘include_top=False’ which excludes the input layer and some starting layer of the hidden layer which do some bottleneck work. These starting layers are also called bottleneck layers. We also define input_shape and the weights to be used. So we have a base model excluding the bottleneck layers.
  3. We will be tweaking this model with our own classification rules, we don’t want that to affect the already trained layers of the hidden and output layer. Hence, we set trainable to false.

#5 Adding Classification Nodes

We are adding some additional classification heads/nodes of our own to the base model.

  1. We are giving our base model (Top Layer removed, hidden and output layers are UNTRAINABLE). This is also called as freezing the model/layers.
  2. 2D Convolution network (32 nodes, 3 Kernel size, Activation Function). You can read more about how CNN works and what is the role of activation function and kernels.
  3. Not all nodes will be contributing to the final output, so, we don’t need all the nodes. We set the probability of each ‘non-contributing’ node being dropped is set to 20%.
  4. In Convolution, pooling is done — 2X2 matrix is taken and pooling is done. Pooling is done for data size reduction by taking an average of the adjacent nodes.
  5. All the above steps are dealing with transformation layers, this is the main Dense Layer. Dense layer takes input from all prev nodes and gives input to all next nodes since it is very densely connected and hence called the Dense Layer. We are using the Activation function called ‘Softmax’. There is another popular choice for Activation called ‘Relu’.

#6 Compiling the Model

Until now, we have added our classification heads to the untrainable model. Before training the model, we need to compile it first.

  1. model.compile takes an optimizer algorithm. [Adam/Xavier algorithms help in Optimization. We are using Adam here.]
  2. Weights are changed depending upon the ‘LOSS’ [‘RMS, ‘CROSS-ENTROPY’ are some algorithms for minimising loss]
  3. We need to define on basis of which metrics/parameter our loss will be calculated? here we are going with Accuracy.
  4. model.summary prints out the summary of the model.
  5. Then, we are printing some stats of training.
  6. We set to 10 Epochs — Model will be trained in 10 iterations.
  7. Finally, we fit/train the model. Here we are training only the layers which we added in the previous steps. There were some layers which were frozen by using trainable = false. We are not training those here, just our added nodes.

#7 Fine-Tuning the Model

In our feature extraction experiment, you were only training a few layers on top of MobileNet-V2 base model. The weights of the pre-trained network were not updated/trained during the training until now.

One way to increase performance even further is to train (or ‘fine-tune’) the weights of the top layers of the pre-trained model alongside the training of the classifier nodes you just added. The training process will force the weights to be tuned from generic features maps to features associated specifically to our dataset.

  1. We set the base model to trainable again by doing trainable = true
  2. Just printing some information for our reference.
  3. Fine-tune at 100, it is seen that the bottleneck layers lies in the first 100 layers (for this model), so we are setting it to 100. This number was taken from Google Colabs.
  4. We are going to freeze the top 100 bottleneck layers as training them will have no contribution in improving our accuracy. Hence we use the for loop to freeze them.
  5. Like we have seen before, we need to compile the model before training, so we just compile our model. This time we are using Adam() training function and we are passing an argument of 1e-5 this is the training rate. The lower the training rate is, more slowly and gradually the model will tend to improve towards perfection. By tuning weights of nodes very slightly. If you will increase it, which you can depend upon your requirements. TensorFlow Playground is a great place to visualize how training works. You can play with the “Learning Rate” and see how the graph is plotted for accuracy.
  6. Then we fit/train the model in 5 iterations. Just to validate our model, we are also passing the validation generator.
  7. After training is done, our model is now trained! Hurray! But our job is not done yet. We have to save our trained model to ‘h5 format’. This is the format of the trained Keras model.
  8. So we create a saved model directory and save the model there. Once the script runs you should see a saved trained model file.

Here is the link to the full commented python script ModelTraining.py.

Converting to TFLite

Great job if you have made it so far! If you are facing any issues, please let us know we will reach back to you as soon as we can.

So, you have a saved trained model file, next, we just need to convert this model to a TfLite file which we can then integrate on iOS, Android, IoT devices.

This is a separate script, which takes the model from the saved directory and converts it to TfLite and saves the new TfLite file in our project.

  • Just for reference, this is how the final project structure looks like.
  • The converted file is the converted_model.tflite and the saved model file is fine_tuning.h5

That’s all folks! Hopefully, you are not facing any issues in the above steps. If you have any doubts or suggestions to improve this article please comment down below, we will surely get back to you as soon as possible!

Recommended Links

  1. Read this medium story by Oleksii Kharkovyna for a great kick start if you what to learn the background things working behind the scenes.
  2. If interested and you want to dig deeper into learning, highly recommend the Kaggle’s Deep Learning course.
  3. To understand more about Neural Network and to be able to visualize it, highly recommend that you visit TensorFlow Playground.
  4. Facial Detection Android using Machine Learning and Firebase — Link
  5. You can refer to this Google CodeLabs if you wish to get more in-depth insights with some Flower Classification example.

Special thanks to Himanshu Bansal for contributing to this story.

--

--