These notes were typed out by me while watching the lecture, for a quick revision later on. To be able to fully understand them, they should be used alongside the jupyter notebook that is available here:
- Kindly use the Jupyter notebook in parallel with these notes for revision.
- The course consists of 7 lessons and the recommended study pattern is around 10 hours a week so overall 70 hours of DL practice
- We will be using Jupyter notebooks, Fastai library and Pytorch to do the course
- Fastai can be used to solve problems in these four areas: Computer Vision, Natural Language Text, Tabular data and Collaborative filtering.
from fastai.vision import *is recommended for learning because while training models the most important thing is to be able quickly interact and experiment.
bsmeans batch size, says how many images do you train at one time.
- During production coding import only required libraries.
- We will be using an academic dataset and be solving a much harder problem than Dogs vs cats. Our model will have to differentiate between 12 cat breeds and 25 dog breeds. This kind of problem of differentiating between similar categories is called fine grained classification
- All images need to be of the same sized square, this helps GPU train faster. We will be using sz = 224 for all the images as they generally give good results. Reasons will be discussed later in the course
In fastai library, everything we model with will be the data bunch object. It will contain the training data, validation data and test data..
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs ).normalize(imagenet_stats)
get_transforms()is used to transform the images into the proper sizes and other useful stuff. It includes cropping, resizing, data augmentation etc.
It really helps when each of the RGB channels have a mean of zero and a standard deviation of 1. Hence we normalize the images.
Models are designed so the final layer is of size ‘7x7’ so we actually want size of images that are ‘7x2’ a bunch of times. Hence we use 224 rather than 256
Being a good practitioner, it is important to look at the data and your labels and check if everything is in order.
In the fastai library, just like databunch is generally something used for data; For training a model, we have something called a learner. Just like databunch has a subclass
ImagedataBunch, for a convolutional neural network, we have a subclass of learner,
create_cnntakes data bunch object as data. We will use resnet34 (a pretrained NN model) first. A resnet 34 model have already been trained for looking at 1.5 million pictures using a dataset called imagenet of different kinds of images. So we can start with a model who knows about a thousand categories of things. This is called transfer learning. It is really important.
Metrics are used to evaluate how the model performed
Overfitting is where we train the model to recognize only the images in the training set and not general images from that category. To check for that, we use a validation set and compare it’s metrics to the former’s.
Instead of using
fit, we will use
fit_one_cyclebecause it works much much better.
The resulting accuracy of the academic paper was 59% in 2012 and of the model we built with 3 lines of code in 2018⁄2019 was 94%
We can use
.save()method on the create_cnn object to save the weights of the newly trained NN model so we can easily load them up in the future.
- To see what comes out of the model, we can use the
ClassificationInterpretationclass and we pass the learn object. The ‘learn’ object now knows both your data and your model so we get a ‘interp’ object which we use to interpret the results.
- You get high loss when the model predicts something with a high level of confidence but it infact is something else.
top_lossesmethod prints them out
- Since in highly accurate models, with lots of classes, it is very hard to peruse through and find the classes that have gone wrong, we can use the method,
most_confusedwhich shows the list of predicted and actual which it found the most wrong.
Unfreezing the network:
- By default, only the last few layers are retrained for the purpose but for getting better performance gain, we have to train much more than the last few.
- Different layers of the neural network represents different levels of semantic complexity. It is unlikely that layer 1 which primarily understands building blocks such as lines, diagonals or gradients is gonna be much different if we retrain the model. So instead of retraining that, we use the same layer. The later layers we will change.
- We can use the learning rate finder to find and plot the learning rate vs loss and decide which learning rate to use. In
fit_one_cyclewe can pass max learning rate as a python slice object (contains start, stop and step).