These notes were typed out by me while watching the lecture, for a quick revision later on. To be able to fully understand them, they should be used alongside the jupyter notebooks that are available here:
- Kindly use the Jupyter notebook in parallel with these notes for revision.
- The course consists of 7 lessons and the recommended study pattern is around 10 hours a week so overall 70 hours of DL practice
- We will be using Jupyter notebooks, Fastai library and Pytorch to do the course
- Fastai can be used to solve problems in these four areas: Computer Vision, Natural Language Text, Tabular data and Collaborative filtering.
How to create own classifier with own images?
- Check this notebook, the first half contains all the necessary details to download images from google. (Created by Fastai, Inspired by Adrian Rosebrock)
Go through the steps from the first tutorial to create a model
- We need a validation set to know how well the model is doing. So we need to set aside a validation set automatically and randomly.
- It is important that the selection of the validation set is random but that randomness is to be saved so that we use the same validation set for every iteration of changes.
- Only then we would be able to accurately track how much better our model is actually getting.
To do so we use:
On the learning rate finder the longest downward slope that is sticking around for a while. Pick a learning rate on the steepest bit you find in that area.
The top learning rate can be a difference of 1e-4 or 2e-4.. It doesn’t really make that much difference.
Cleaning up the database:
Once we run the model, we can use our plot top losses in both training and validation datasets to figure out where the model got things wrong. Sometimes it is totally because of noise in the image.
We can delete those images using the fastai file delete widget.
Putting model in production:
- For the vast majority of cases, you want to deploy on a cpu. You would barely have to process 64 images at once.
- Yes the GPU is about 10-20x faster but it is a hassle to batch everything and run it together. Also if you have to scale it, it’s more hassle.
With the cpu instead of 0.01s it would take 0.2s, which is more than acceptable given that it will be really cheap, easy to set up and is horizontally scalable.
Using a trained model to do predicting is called inference
How to use a trained model for inference?
First we export all the information about the
learnerobject into a pickle using
We create a
learnerobject for the image/s that we want to classify,
learn = load_learner(path)
We create our Learner in production enviromnent like this, just make sure that path contains the file ‘export.pkl’ from before.
learnerobject has no data in it but knows how to transform a new image in the same way as you did your training images.
Then we do
learn.predict(img)to get the result
To use the model in a web app, we have to make sure wrap the model up using flask or starlette to serve as a REST api
What happens when we run into a problem:
Generally the issue is:
1. Learning rate 2. Number of epochs
These are how to know what is really wrong:
- If validation loss is too high then learning rate is too high
- Over the first few epochs if the error_rate is going down really slowly then the learning rate is too high.
- When there are too few epochs, the training loss is much higher than the validation loss
- When there are too many epochs, the model starts overfitting
- The only way to figure out if we are overfitting is if the error rate is going down for a while and then starts going up again
You never want a model where your training loss is higher than your validation loss!! It either means that your learning rate is too low or that you haven’t fitted enough i.e. not had enough epochs
A lot of people would say having validation loss more than your training loss is overfitting. That is incorrect. Any model that is trained correctly will have training loss lower than validation loss.
Metrics are always passed to or used on the validation set.
The error rate is
1 - accuracy
accuracy is the mean of how often the input argmax is equal to the target value
Why the use
3e-4 was being used as the default learning rate?
- It was found from experiments that 3e-3 was a great learning rate before you unfreeze. You go from there.
Then the next stage, you use something that is 10 times lower than that.
learn.fit_one_cycle(4, slice(xxx, 3e-4))
xxxis the number you find from the learning rate finder.
- Going bottom up instead of top down.. We’ll create simplest possible model we can. We’ll learn about pytorch as well.
- Tutorial follows through with explaining Gradient Descent which is almost exactly this post
Notes on pytorch
- A tensor is nothing but a 3D matrix/array. A tensor has rank and axis.
- An image is a rank-3 tensor.. If you have 64 images, it would be a rank-4 tensor of dimension 64 x 480 x 640 x 3
Here the 64 is the number of images, 480 x 640 is the resolution, 3 because for a colour image there are 3 planes RGB
The syntax to create it goes as follows:
x = torch.ones(n, 2)
Number of rows will be n and number of columns will be 2
The tensor will have only ones
x[:,0]means all the rows in the 0th column
uniformimlies uniform random number but in pytorch if there is an underscore after, i.e.
uniform_then it means don’t return the value but replace the current one with the random number instead
x[:,0].uniform_(-1.,1)replaces all the rows in the 0th column with a random number between -1 and 1
The reason there is a
-1is that if there isn’t then the tensor would be int. Python doesn’t need you to write
(-1.0,1.0). It will understand you need floats if just put
backward()is used to calculate the gradients.
Stochastic Gradient Descent
- The only difference between Gradient Descent and Stochastic Gradient Descent is something called mini-batches.
When we have lots of image data, we cannot load all them into memory. For instance if we have a 1000 images then, we could do perform gradient descent on 64 images (one minibatch) at once.
As shown in the cover picture, we are looking for a mathematical function that fits just right.
There are other methods to make sure we don’t overfit. That is called regularization.
The main way to know if your model is underfitting or overfitting is through the validation set and hence it is essential!
For detailed info about the same check out the post by Rachel Thomas: How (and why) to create a good validation set