Feline Classifier
This is one of my first attempt to build a deep learning model
- Intro
- Step 0: Get the libraries
- Step 1: Collecting training data
- Step 2: Training
- Step 3: Retraining
- Step 4: Inference using Model and Export for Deployment
- Step 5 Creating a Notebook App
- Step 6: Developing a Real App
Feline Classfier: Cheetah or Leopard or Jaguar
This program is to create an image classifier which will help in solving the age-old conundrum I have about distinguishing the three felines: Cheetah, Leopard and Jaguars. I want to see if a simple deep neural network can do a better job than me (who has just now learned to see the difference between Cheetah and Leopard).
The question has been in mind ever since I heard stories of leopards (or panthers) in the university campus where I did my masters. I eventually never saw one, but I caught myself consistently misusing the names (including cheetah as well). The whole confusion stopped only when I left the campus because I no longer was worried about the semantics of the animal which can attack me when I'm walking through the campus at 2 AM. (This is a reflection from my time at IIT Bombay, India, where the campus is known to get leopard visits)
What's the point?
There is no point, like there is no point in life.
I want to learn the process of a machine learning product development: From data collection to production.
How am I doing it?
- Training/Validation Data: Bing Image Search through Azure (150 images per group)
- Neural Network Training: Resnet18 (Neural net) with Fastai (PyTorch)
- Google Colab for the GPU resource
- Binder for the production version
Who gave this insane Idea?
The feline classifier? That's mine. I will patent it if I can. But well, what's the point?
You mean developing an ML product? You have to blame Jeremy Howard of the Fast.ai team. He is a nice guy.
So you did everything on your own?
Ha ha, no way. I'm just following the Chapter 2 of Jeremy's Fastai book.
You did not get any other ideas?
Glad you asked. I also tried a few more. I developed a classifier to distinguish between cricketers (bowlers, batsmen) and baseball players (pitchers, batters). (I'm from a former British colony, no points for guessing that). But the training data was pretty disappointing and I ended up spending too much time battling with that. It is available here: Cricball-classifier
import fastbook
fastbook.setup_book()
And get the vision components from fastai library.
Now, don't come with all guns blazing against the
import *
go and talk to Jeremy. But he has assured me that there won't be much namespace pollution. Anyway, what's to worry about namespace pollution in a Jupyter notebook anyway?
from fastbook import *
from fastai.vision.widgets import *
I did not think this would be so confusing. In the video, Jeremy just says go and get your microsoft azure account and then get the key and you can use them from the Jupyter notebook to get the image file.
This was the moment I realized how much the world has changed since I seriously did some internet treasure hunting. I was an early pioneer in this kind of exploration, I even had a website back in 2002 using my UG university, but off late, the cloud and others have been going at breakneck speed that I felt like a child lost in a festival crowd.
So I was aimlessly wandering around in my Microsoft Azure account, not knowing what I'm doing. And I even confused my unique USERID as the API key and then used it to find the API scolding me as being stupid.
Anyway, I got a saviour in the form of a guy who was an Azure expert who joined a conference call to learn Pythong together. He took me through the tangled maze. The idea is something like this (I actually have made it super simplified and not sure if the following steps are enough. In fact, recently Microsoft moved the image search API elsewhere and so I'm not even how it will work now).
- Create an Microsoft Azure account (if you have a Microsoft account, use it). The account is free, though, after 30 days, you will have to subscribe to an account which will be 'pay as you go' basis. However, the image search that we do will be within the Free tier and we don't have to pay anything.
- Go to your portal
- Create a resource (+)
- Bing Search API
- Once you created the API, you should check your keys under "Keys and Endpoints" column of RESOURCE MANAGEMENT.
There are two keys available, and which is also suggested to be changed regularly. Any one will do.
key = 'abcdx2123dasfk4353m434'
Now, to actually collect the data.
The first step here would be to name the types of images are trying to get (which would also form the basis for our image search)
image_types = 'cheetah','leopard','jaguar'
path = Path('feline_classifier')
The following code will do it. Some points to note:
- If the image search keyword we use would have a space (e.g. African Cheetah), then a separate image_folder_name variable should be created such that the latter contains foldernames that are acceptable. In the following,
(path/type)
, refers to the use of the image_types variable to create a folder name. So if there is a space in the folder name, it may not be ideal to manipulate them. - There are two fastai functions used in this code (the following are the actual source codes from the fastbook module):
Which calls the appropriate API (api.cognitive.microsoft.com) with thesearch_images_bing(key, term, min_sz=128)
key
, and with a image search query for 150 images (that's the limit imposed by Microsoft, it seems). And then the functiondownload_images()
, which downloads images listed in text fileurl_file
to pathdest
, at mostmax_pics
download_images(dest, url_file=None, urls=None, max_pics=1000, n_workers=8, timeout=4)
Neat, isn't it?
if not path.exists():
path.mkdir()
for type in image_types:
dest = (path/type)
dest.mkdir(exist_ok=True)
results = search_images_bing(key, f'{type}')
download_images(dest, urls=results.attrgot('content_url'))
We can check whether the files have been download. Before we use these image files for training the model, it might be a good idea to actually verify if any of the images were just some dead links. Fastai has a function that does it: verify_images(fns)
, which verifies if any of the files in the input list can not be opened. To create this input list, we use the function get_image_files(path)
which will return all the image files present in the subfolders of the path (by recursively going into everything).
fns = get_image_files(path)
failed = verify_images(fns)
We can check how many of the files that were not opening properly by looking at the failed
variable. To remove those invalid files, we can unlink them using the map()
method
failed.map(Path.unlink)
Now, the data collection process is done
Before we use the data to train a neural net, this data should be assembled in a particular form. In fastai
these are called Dataloaders
Dataloaders
Raw data --> The data that we have got from Bing image search requires some form of organizing, the least of which is to split them into training and validation components. The other actions include, how to get the image files, how to obtain the labels, how to preprocess the images (since images are resized usually to a standard (224px) form for training neural nets), etc. This is handled in fastai using the DataBlock
class.
felines = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42), # says 20% validation data
get_y=parent_label,
item_tfms=Resize(128)) # The type of transform to perform: Resize.
felines
here correspond to an instance of the class DataBlock
that we will manipulate for our purpose. The key element here for our purpose would be the item_tfms
which can take many forms. When the Resize
option did not have any specific input, it would just resize the entire image to the size required. We could also add the ResizeMethod
option inside the Resize
function to make it do things differently.
-
ResizeMethod.Squish
: squish the image file -
ResizeMethod.Pad
(with the additionalpad_mode=
option) : Add empty padding to the edges of the image file
We can also replace Resize
. In this case, RandomResizedCrop
(with the min_scale=
option): Randomly resize the image by cropping different parts of the same image.
In theory, RandomResizedCrop
is better because it takes one through a lot of different cuts to understand the image under study. But I'm skeptical if this is true in case of the feline detection as the patterns on the body of the felines might be viewed to be the same. On the other hand, the face of cheetah and leopard can be easy to distinguish by the neural net.
One advantage of the RandomResizedCrop
for our case is that it will create more input data from the 150 odd that we have. Yay!
This process is called Data Augmentation. And with all data in the same size, running them in a batch mode on GPUs become easy ( now we know, like wheat enslaved us to become farmers, GPUs are enslaving us to take photographs in a specific format. Huh, colonialists!! <\sarcasm>). To do this, we add another parameter/option to the class called aug_transforms
We can change the item_tfms
of the class after initialisation by using the new
method associated with the class, i.e. felines.new(item_tfms=)
.
felines = felines.new(item_tfms=RandomResizedCrop(224,min_scale=0.5), batch_tfms=aug_transforms() )
Once we have ensured that the class felines contain all that we want, we then apply the dataloaders
method of the class on the data of our interest to create a dataloader object
dls = felines.dataloaders(path)
Now, we can happily go and train out neural net. Yay. So which neural net are we going to use?
Woohoo! That sounds big. Is it? What is it? Well, it is a pre-trained PyTorch Convolutional Neural Network model for residual learning of images.
Wo! Wo! hold on. Too may jargons.
- Convolutional Neural Network (CNN) - For now, I just know that they are pretty good with images.
- pre-trained - It appears that we don't need to train a neural net from scratch when it comes to images. Even though some images are completely, utterly, unrelated, even then some basic pretraining on a vast image database appears to be better than starting from scratch. The way ResNet-18 is trained, etc., are for another time (clue: residual learning). For now, we are going to work on top of somebody else's work, even if they may have never tried to answer the most difficult question of all, "Is it jaguar or leopard?"
This appears one of the very good examples of Sitting on the shoulders of giants
We will use the cnn_learner
function under fastai which takes in a dataloader and an architecture
cnn_learner(dls, arch, loss_func=None, pretrained=True,
cut=None, splitter=None, y_range=None, config=None, n_out=None,
normalize=True, opt_func=<function Adam at 0x000001DDAE66C670>,
lr=0.001, cbs=None, metrics=None, path=None,model_dir='models',
wd=None, wd_bn_bias=False, train_bn=True,moms=(0.95, 0.85, 0.95),
)
Our interest lies mostly on the components in the first line. The architecture we would use would be ResNet18 and we will use a metric
that is error_rate
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(2)
learn
is an instance of a Learner
class under fastai which contains several methods, and the first of which we use is fine_tune()
, which essentially is to tune the model. The one parameter it takes in is the epoch
which is the number of times we would ask the algorithm to look at each image. We use fine_tune()
instead of fit()
because we are starting with a pre-trained model (and so not to lose all those nice parameter values already obtained by someone else, to start our model tuning). Here, we are asking for an epoch of 2, but we could do more as well.
Once the model is tuned, we have to see how good it performed in the validation data set. We go for the confusion matrix based visualization of the performance. We go through two steps for this. First, we create an instance of the fastai class, ClassificationInterpretation
's from_learner
method which is an interpretation object. This interpretation object contains the plot_confusion_matrix()
method which will let us know how the classification went.
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
The same interpretation object contains other useful analysis information about the model we have trained and its performance. One is the plot_top_losses(x, nrows=)
method, which can plot the images where the model performed really bad. We can ask the method to print a top x number of images, and splitting the visualization in nrows
.
interp.plot_top_losses(10, nrows=2)
One advantage of doing simple image classification is that, we can use our eyes to judge how bad things are (okay, I know, I still can't distinguish between jaguar and leopard, but I'm talking about in general, for example, in the cricball classfier, I was absolutely great in distinguishing between a pitcher and bowler or a batsman and a batter). And top losses would let us do that. It is also indirectly a way to clean the data, because we did not do any check on whether the labels were right. I mean, after all, bing image search doesn't come with any warranty that images correspond to the keyword (If I was a lawyer, I would check if one can file a class action suit against Microsoft for images that have been wrongly labeled. Fortunately, I'm not a lawyer and I don't live in the US).
As you may guess, fastai
has a way to clean the data and unsurprisingly, it is called ImageClassifierCleaner
. It is a class which provides a widget that helps to clean the images on the learner
obtained from a cnn_learner
training. The Widget lets one to either delete the image or provide the right label if it has been wrongly labeled (I promise, I won't touch the jaguar vs leopard mistakes) across both the training and validation data.
cleaner = ImageClassifierCleaner(learn)
cleaner
Once we do the cleaning by using the widget, we make them work by unlinking the images set of deleting and changing the paths of the images whose labels were wrong.
In this analysis, I found so many Jaguar cars and had to delete them
for idx in cleaner.delete(): cleaner.fns[idx].unlink()
for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)
The above may throw an error, for example, if there are no images marked for deletion. Though that's not going to do much damage, just comment that line and run the other line.
Once we clean up the data, we have to rebuild the dataloaders and rerun the model fine tuning. Let's remember to keep the seed (42 is chosen for a reason, it is the answer to life universe and everything).
felines = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=Resize(128))
felines = felines.new(item_tfms=RandomResizedCrop(224, min_scale=0.5), batch_tfms=aug_transforms() )
dls = felines.dataloaders(path)
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3) # This, time, one more fine tuning level
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
Does the confusion matrix got better? If not check again the top losses and go through the same retraining steps a bit. But if I do it so many times, I might end up populating the AI with my own crooked intellegence. So perhaps one or two times would be enough.
The fine tuned model is now in learn
. We can export it into a format such that it can be reused without someone having to retrain the model, etc. That is, while we would need GPUs for training (cnn_learner()
), we do not actually need GPU resources for running the prediction component of the model. For this purpose, we will export the model into a pickle format.
The pickle module implements binary protocols for serializing and de-serializing a Python object structure.
learn.export()
We should now have a 'export.pkl' file in the current folder which can be checked as follows,
path = Path()
path.ls(file_exts='.pkl')
This pkl file can be taken wherever we want, and if we have the appropriate tools, we can then use it to create an application. As a first step, we will create an application within the ipython notebook. To simulate how it will happen in any ipython notebook, we load the pickle file (called unpickling) into an object which we will use for inference.
learn_inf = load_learner(path/'export.pkl')
To test whether a given image is a specific type of feline we just have to use the predict
method and specify the image location
dest = 'jaguar1.jpg'
link_feline = 'https://l-express.ca/wp-content/uploads/2020/06/Jaguar-Bresil-1024x576.jpg'
download_url(link_feline, dest)
im = Image.open(dest)
im.to_thumb(128,128)
dest = 'jaguar2.jpg'
link_feline = 'https://upload.wikimedia.org/wikipedia/commons/e/ee/Jaguar_full.jpg'
download_url(link_feline, dest)
im = Image.open(dest)
im.to_thumb(128,128)
im = Image.open(dest)
im.to_thumb(128,128)
learn_inf.predict('jaguar2.jpg')
!zip -r ./feline_images.zip ./feline_classifier/
The output contains three components
- The name as in the folder name or the keyword used
- The
tensor
number, that is the position of the inferred location in the vector/array (tensor is the PyTorch way of saying array) - The list of probabilities corresponding to each class.
Since I worked on google colab, I had to download the file to my PC and then move it wherever I want (even within my own PC which doesn't have a GPU resource)
from google.colab import files
files.download('export.pkl')
What do we need to create an application
- iPython widgets
- Voilà
ipython widgets bring Java Script + Python to web browsers. So we can run an application with jupyter notebooks
Voilà takes us one step further and eliminates the need for a jupyter notebook as well where the end user can access the application with just a browser.
The code in this section can be deployed separately in any Jupyter notebook
First, we need a file upload widget through which we can upload an image file into the app
btn_upload = widgets.FileUpload()
btn_upload
This would give us a widget clicking which we can feed in the backend with an image file (or whatever as the case maybe)
img = PILImage.create(btn_upload.data[-1])
We can then use Output
widget to display the uploaded image
out_pl = widgets.Output()
out_pl.clear_output()
with out_pl: display(img.to_thumb(128,128))
out_pl
The predictions can then be obtained through the inference. Remember that we assume we already have the export.pkl file. If not we upload it. In the following, we use the google colab to do it.
from google.colab import files
files.upload()
Funnily enough, the above mechanism (uploading a file) is what we are trying to replicate. Now, we then load the pkl file
learn_inf = load_learner(path/'export.pkl')
Since we already have it in our current folder, we don't do it and directly go to prediction
pred, pred_idx, probs = learn_inf.predict(img)
And we use Label
to display the results
lbl_pred = widgets.Label()
lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'
lbl_pred
We can also add a button to classify, kind of separting the uploading and the classifying process
btn_run = widgets.Button(description='Classify')
btn_run
Since we havea button, then we also need a way to handle the event of clicking the button. So we will put together above steps (predict, display, etc.) into the event handler
def on_click_classify(change):
img = PILImage.create(btn_upload.data[-1])
out_pl.clear_output()
with out_pl: display(img.to_thumb(128,128))
pred,pred_idx,probs = learn_inf.predict(img)
lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'
btn_run.on_click(on_click_classify)
Now, we can put all these together (in a vertical box called VBox
)
btn_upload = widgets.FileUpload()
VBox([widgets.Label('Select your Feline!'),
btn_upload, btn_run, out_pl, lbl_pred])
For this step, there is nothing new in terms of the code, we just need to put together everything in Step 5, put them in a new jupyter notebook and add them into a single folder.
Then, install voila, if not already there, and then connect voilà to the current jupyter notebook.