These notes were typed out by me while watching the lecture, for a quick revision later on. To be able to fully understand them, they should be used alongside the jupyter notebooks that are available here:
- Kindly use the Jupyter notebook in parallel with these notes for revision.
- The course consists of 7 lessons and the recommended study pattern is around 10 hours a week so overall 70 hours of DL practice
- We will be using Jupyter notebooks, Fastai library and Pytorch to do the course
- Fastai can be used to solve problems in these four areas: Computer Vision, Natural Language Text, Tabular data and Collaborative filtering.
It makes every choice surrounding setting up data a separate block:
In the fastai repo, the documentation is actually a jupyter notebook so we can experiment with it ourselves
When you look into what a DataBunch object is, it binds together a training and validation set dataloader
tfms, in case of planets dataset,
flip_vertis true because in satellite images unlike in cats or dogs, vertically flipping doesn’t change the image. Hence in doing so, it will help our model generalize better.
get_transformationsparameter implies perspective warping. Since in sattelite images the photos are from the same angle always, we set it to 0.
To create a learner we use the same function as in lesson 1. Our base architecture is resnet34 again but the metrics are a little different. We can pass a list of metrics, so we pass
In previous example, it was simple classification so we determined prediction by picking activation that was the biggest (argmax).
This is multiclass classification so here we check each activation (which is between 0 and 1) and select the ones that are above a certain threshold (0.5 by default).
Finally we compare it with the ground truth for accuracy. This function is called
We use partial from python3 to create custom functions with a different default. We create
acc_02with 0.2 as the new default threshold.
f_betais used to weigh up the false positives and false negatives. There are various scores like f1 and f2.. Kaggle asked to use f2, so that will be used.
Then we go on to do all the same stuff like
lr find, use that and then learn from
fit_one_cycleand save the model.
How do you train over misclassified images?
Firstly we have to find the misclassified images using a feedback system. The system should log in file name, predicted class and actual class.
Once we gather enough of this data, we can create a new databunch of misclassified images and fit with that.
These are going to be particularly interesting so we might wanna fit with a slightly higher learning rate or run them for a few more epochs, to make them mean more.
Does Datablock api blocks need to be in a certain order?
Yes they do!
What kind of data -> how do you label it -> how do you split it -> what datasets you want? -> Optionally how to transform it? -> how to create a databunch from it.
- Before we unfreeze, we get the learning rate at the steepest slope right before it rises.
- When we unfreeze, we get a different shape. This a bit harder to say what to look for. So the rule of thumb is to see right before it shoots up and go back by 10x. So here
How to increase score further?
Kaggle gave us
256*256images and we used size
128*128to be able to experiment quicker.
Now we can actually use this 128 model and do transfer learning with a new databunch of these
256*256images. Then unfreeze and train more. True enough, it does increase accuracy by a significant amount.
Image Segmentation(CAMVID problem):
The camvid problem is interesting, as it is a classification problem for every single pixel in every single image. Is that pixel part of a building, a cycle, a human, the road? That’s called segmentation.
Since we are classifying every single pixel, it is a gpu heavy task and so the batch size is 8. We need an already classified dataset, which is the camvid dataset.
The procedure is exactly the same here. We just use
Learn.recorder.plot_lossesplots training loss and validation loss.
On plotting learning rate (
plot_lr), we see that it goes up and then comes down. This is because we use fit one cycle.
What is it and why do we do this?
It turns out loss functions tend to not have smooth surfaces but have flat and bumpy areas.
If you end up in the bottom of a bumpy area, you probably will not generalize well since that solution is good for that particular area only.
Whereas, when you find a spot in a flat area, it is not only good in that one spot but it is good around it as well.
Low learning rate means we can get trapped in these small valleys. It turns out that gradually increasing the learning rate, is a great way for the model to explore the whole surface area and avoid these small valleys and move towards a flatter surface.
Then we start reducing the learning rate (learning rate annealing) to find the lowest error.
So when plotting the losses, the loss should be going slightly up and then going down. If it is only going down, try to increase the max learning rate, try more epochs, less epochs.
NOTE: You can reduce the memory requirement of the gpu by using Mixed Precision Training. Instead of using single precision floating point numbers, most calculations can be done with half precision floating point numbers (16 bits instead of 32 bits). Note: This only works if you have a recent GPU and most recent CUDA drivers
Any time you want to predict some continuous value (In this case co-ordinates for center of face), you can do Image Regression in the same manner as well.
import from fastai.text
To create a databunch from csv of a text file is by using
First thing we do is tokenization. Within Tokenization we convert everything into lowercase, take care of punctuation, separate contractions (For ex: did, ‘nt), clean the text of any HTML and convert each word into a token.
Once we have extracted tokens, we convert them into integers by creating a list of all words used atleast twice. This is called Numericalization.
We can now replace every movie review with a list of numbers.
Then we create a learner object and the process is mostly similar
When using a dataset that is very different from imagenet (like sattelite or x-ray images), when transfer learning, should we normalize using the same stats we trained with?
We should always use the same stats as the pretrained model because that model was trained with those stats.
If we don’t, the unique characteristics of your dataset won’t appear to the model anymore because you normalized them out.
For example, imagenet expects frogs to be green, if in your dataset, you normalize the red blue and green channels, the frogs will appear grey to imagenet and it won’t be able to identify it.
How does tokenization work when words depend on each other like San Fransisco?
- With DeepLearning, you don’t have worry about that. The complex feature engineering like this vanishes. An RNN can learn that San + Fransisco together have a different meaning when compared to when they are together.
How to use pretrained models when your dataset has 2 channels or 4 channels instead of 3?
For 2 channels you have to create a 3rd channel which is either all zero or the average of the values of the other two channels.
For 4 channels you will have to modify the model itself, will be covered in later chapters.