How to Classify Images using Machine Learning

You can access the full course here: Build Sarah – An Image Classification AI

Table of contents

Transcript 1

Hello everybody, and thanks for joining me, my name is Mohit Deshpande, and in this course we’ll be building an image classification app.

Given a set of images, we’re going to train an AI to learn what these images are, and then we can actually assign them labels. So, you see some of what our data set is gonna kinda look like, you have things like trucks, cats, airplane, deer, horse, and whatnot. And so, when, what we will be building is an AI that can actually classify these images and assign them labels so that we know what’s in the image. And so, we can build an AI to do that. So, kind of the big topic here is all about image classification.

So first, I want to introduce you to what image classification is, in case you’re not familiar with it. I will also do like a quick intro to machine learning as well. And, kinda the first approach that we’re going to take is through this thing called the nearest neighbor classifier, and so we’ll kind of build the intuition behind how that works, and then write the code for that from scratch.

Then, we’ll move on to something a bit more generic than that, and a bit better, and it’s called a k nearest neighbors classifier. So, instead of just the nearest neighbor, you look at the top k hostess neighbors, is kind of the intuition behind that. And I’m going to go into much more depth with that And, for this actually we’re going to use a pre-built, pre-built models, or pre-built classifier, whose code is already written so it can get kind of complicated with that. Then, we’re going to talk about hyperparameter tuning, because the question is then, you know, how do we choose the value of k, what is k, and so we’re going to be discussing how we pick these values and the approaches that we can take to get the best possible hyperparameters.

And finally, I also want to discuss the CIFAR-10 dataset, and what’s really cool about CIFAR-10 is that it’s a very popular, widely-used, real dataset that people doing research in image classification use to, when they’re reporting their results. And so, it’s going to be really cool, because you’ll be using that same dataset that the top researchers have used before. So, we’ll also be looking at that CIFAR-10 dataset. So, we’ve been making video courses since 2012, and we’re super excited to have you onboard. Online courses are a great way to learn new skills, and I take a lot of online courses myself. Zenva courses consist mainly of video lessons that you can watch at your own pace and as many times as you want.

All the source code that we make is downloadable, and one of the things that I want to mention is the best way to learn this material is to code along with me. So, we highly recommend that you code along so that you can better learn the material, because there’s a big difference between watching someone code and coding yourself. And finally, we’ve seen the students who get the most out of these online courses are also the same students who make, kind of, a weekly planner or a weekly schedule and stick with it, depending on your own availability and your learning style. Remember that these video lectures, you can watch and rewatch as many times as you want. So, that really gives you more flexibility.

At Zenva we’ve taught programming and game development to over 200,000 students, over 50 plus courses, since 2012. And these students have used the skills that they’ve learned in these courses to advance their careers, start up a company, or publish their own apps and games. Thanks for joining, and I look forward to seeing the cool stuff you’ll be building. Now, without further ado, let’s get started.

Transcript 2

Hello everybody, my name is Mohit Deshpande and in this video I wanna give you guys an overview of machine learning.

And we’ll talk a little bit about where it came from and towards the end I just wanna list a few different subfields within machine learning that there’s a lot of ongoing research currently going into that. So that’s what I’m gonna be talking about in this video.

So let’s get started. So before we had machine learning or actually just artificial intelligence in general, AI, computers were very unintelligent machines. Because even though they were really good at computing large numbers or performing large computations and things of that nature, even though they could do those really fast, they had to be told exactly what to do.

And so like I said, that’s something worth writing down. Is something like, before AI, computers had to be told, had to be told exactly, oh that’s a bad exactly, told exactly what to do. Told exactly what to do. You had to account for every possible input or change in your machine state or something like that, you had to account for every single possibility. And that became tedious very fast because there were cases where this becomes incredibly time consuming to have to hard code in your program all of these possible configurations or possible inputs. And that also adds to the length of your program.

And so way back then it was just something that before AI it’s something that you just had to do or you had to have some sort of fail safe condition or something like that. But then towards, after, then people started asking the question, instead of telling computers exactly what to do each time, can we teach them to learn on their own? And as it turns out there was a lot of stuff going around in science fiction particularly, authors and writers in science fiction, were starting to depict robots and they had robots being sentient beings and they looked like mechanical men is I guess what the term was, but eventually turned into robots.

And they had all these futuristic stuff with robots like they could greet you and shake your hand and they just had this repository of knowledge that they could draw from and they were sentient, they knew that they were, they knew their own existence and everything and they learned. And that’s probably the most important aspect of the thing that AI researchers were taking from science fiction is that robots could learn. And so then they started getting into, how can we model knowledge and how can we get some kind of representation with which to learn. And that starts getting into this period of time when we were doing stuff called classic AI, classic AI. And that was actually more centered around intelligent search instead of actual learning.

So what I mean by that is let’s suppose that we were playing a game, something simple that we all know, so tic-tac-toe or something. Where let’s say that I am the blue circles. So and suppose I play a move here and then it’s the computers turn and so then the computer has one, two, three, four, five, six, seven, eight, the computer has eight possible places where it can put an X.

So what classic AI was trying to do is it will try every one of these possible combinations and then it’ll try to predict. So if the X was put here for example, then after that X was played then it’ll try to predict what my motion is. Then maybe I’ll play something like this and then from there the AI could one, two, three, four, five, six different moves. And so it tries each one of them and eventually you get this giant search space basically where you’re looking at every single possible way that the game could be played out from the human just playing a single O here. I mean there are so many possible combinations.

And as it turns out there are different techniques that you can actually get this working reasonably well. You can brighten AI to play tic-tac-toe with you and such that it will choose the best move to try to prevent you from winning. It’s actually called, that’s called a minmax strategy. But anyway, you can build this and it’s actually not that hard to do and it runs reasonably fast. And so this is something that you can build, but this is for something like tic-tac-toe, this is a really simple game.

Imagine if we had something like chess. That’s wrong color there. I mean, imagine if we had something like chess where it’s not just eight possible moves, it’s so, so many moves. Tons and tons of moves on this chess board. And as it turns out, I think way back in, I think sometime in the mid-1990s or something one of IBM’s machines, Deep Blue I think is what it was called, actually ended up beating the national chess champion or something similar to that.

And so trying to do this classic AI stuff with search when it comes to large games like chess or even with even larger games like there’s a game, an ancient Chinese game called go that’s often played and it has even more configuration possible moves than chess, so at some point it just becomes. The number of possible ways a game could be played out is so big that it would either one, use up all the RAM on your computer and crash or two, it just, computing all of this stuff out would take much, much longer than you could actually play a game with. And so search is not a good thing to really do, but back then it was the only viable option at that time.

There was some dabbling going on in actual learning, but a lot of the stuff with classic AI was using search, different kinds of searching algorithms and so you could have it play tic-tac-toe or chess or something. But recent, relatively recently I should say, there’s been this move from instead of search we move towards actual learning.

So we move towards actual learning. So instead of looking at all possible configurations, we start training an AI, we start teaching an AI by giving it lots of example data that it can draw from and so when it gets new input data it can intelligently, it knows because it’s seen previous data, what to do with this new problem. So if you have a particular problem when you’re training an AI, you give it lots of examples with the problem and then it can start learning ways that it can approach a problem. And this is all, I am speaking in the abstract sense because I wanna make this as general as possible. But as, there are a lot of different subfields that I don’t wanna get to specific because then it won’t apply to some subfields.

But right, so when we’re trying to solve a problem we train an AI and then it’s, the AI has seen examples of how to solve the problem and so then it knows from new input it can reason through how to solve that problem with some new input. So that’s a broad level overview of machine learning. But there are actually a few subfields within this.

This is, machine learning itself is a fairly big field. So there’s research going on into, I’m sure you’ve heard of neural networks, I think they’ve been in the news at some point. But neural networks try to take the more biological route and they try to model what’s going on in our brains. Albeit it’s a very overly simplistic model, it’s still a model and it turns out that it works really well. It turns out we can also break down neural networks into things like language with recurrent neural networks or vision with a convolutional neural networks.

But we could even branch this off even further. There are people researching deep learning. Specifically, and that’s kind of related to neural networks, but it’s deep learning, the issue is how deep can we make these neural networks, how many layers can we go and what kind of challenges do we encounter as we make these layers really deep? And so they’re trying to find solutions for that. There’s stuff going on with reinforcement learning is also pretty popular. And reinforcement learning is actually used, it’s very popular to use for teaching AI to play games actually, I think there’s a, if you look around, there’s an AI that can actually play the original Super Mario Bros. or something like that. They can play through the original Mario game.

And reinforcement learning helps let you build that kind of model. I think they can also play, like they’ve built reinforcement learning models that can play Asteroid and a ton of the old Atari games, fairly well, too. So right, these are just some of the subfields. I can’t possibly list all of them because it’s a really big field, but we’re just gonna stop right here and do a quick recap.

So with machine learning, before AI, computers weren’t very intelligent, we had to tell them exactly what to do and this became impossible in some cases because you can’t think of all possible configurations or inputs that you can get. And so this is when we start getting into classic AI. But even with classic AI we were technically just doing searching, we weren’t actually learning anything about this. And now we’ve moved from search more to learning and where we actually are learning of knowledge representations and using those.

And I just mentioned a couple subfields of machine learning here with neural networks, deep learning and reinforcement learning to show you that this is a very popular field at this point and it’s a very, very rapidly expanding field. And you can definitely expect many more cool advances to come in the future.

Transcript 3

Hello, everybody, my name is Mohit Deshpande and in this video, I want to introduce you guys to one particular subfield of machine learning and that is supervised classification and so, classification is a very popular thing to do with machine learning.

So, let me actually define this. So, classification is the problem of trying to fit new data…. I should make this a bit more specific, I should say, fit or label new data based on previously seen data. This seems kind of like a weird description at this point but with classification, the task is to… We’ve seen a lot of data and it’s labeled and given some new data, we want to give it a label based on some of the previously labeled data that we’ve seen. I should mention that these are… I’ll put it over here, actually. I should mention that classification is… We have discrete classes or labels to each data point or input and so, let me illustrate this by an example. So, suppose I have a… That was a really bad line.

Suppose I have like a scatter plot, over here or something. Let me just add in some stuff here. So, I’m just adding in a ton of red x’s and then, we’ll add like, blue circles, over here. So, this data is labeled so, these will actually correspond to actual points. So, this for the X direction and this for the Y direction. These would correspond to actual points. I haven’t actually like, plotted all the points, but trust me, they correspond to actual points and you see, I’ve labeled them. I’ve labeled them, but they’re only two classes and there is the red X and the blue circle. If I wanted to, I could add, like some other class, like a green triangle. We’ll add a couple green triangles or something, up here.

So, there’s three classes. We’ll say there’s three classes and so, I have all these points and they’re labeled and so, the problem with classification is now that I have these points, if I received some new point, what label would I assign to it? Would I assign to it a red X, a blue circle or a green triangle? So, what we’re trying to do with classification is to find a way and to build a model so that given this new input, we can actually assign it one of these labels.

So, let’s just do a human intuitive, example kind of thing. So, suppose my point, I’m gonna put in, let’s see, purple. If my point was in here, or something. Suppose this is my new point, here. So, with this being my new point, I would ask the classifier what label should I assign to this? Should it be a blue circle, a red X or a green triangle?

And so, as a human, if you were thinking about this, if I gave you this point and I asked you, what would you assign it, you would say, “Well, I would assign it as a blue circle.” and I would ask you, “Well, wait a minute.’ “Why would you assign it as a blue circle?” and you’d say something, probably along the lines of “Well, if I look at what’s around it, “they’re lots of blue circles, around here.” and it turns out, I guess this region of the plane, here tends to have more blue circles, here than red X’s, so, I can try to carve out this portion, over, here, seems to be a lot of blue circles. So, this is probably what I would assign this point and it turns out, that if you were probably to give this to a classifier, he would probably give this a blue circle.

So, I say, “All right. “Now, what about a point, over here?” And so, you would say, “Well, I would give that a red X.” When I ask you again, “Why would you give it a red X?” and the reason for that, is you give the same answer. You say, “Well, in this portion of the plane, over here “of this given data, it’s closer around that question point, “around that new input, there’s a lot of red X’s “and so, I would think that it would be most likely “to be given with a red X.” and so, that’s right and now, I can do the same thing, where I say, I have a point up here, or something and you’d say, “Well, this part of the plane, here is more… “like this part over here, you’re more likely to encounter “a green triangle than you are any of these.”

And so, I would probably give this point a… Probably say that, that new point should be a green triangle and so, this is kind of like, the thought process that is going on with these classifiers and so, what you use to make your decision, was this kind of… I kind of drew it in, here.

This kind of imaginary boundary sort of thing, between our data and so, this called the decision boundary. The decision boundary, right here and it helps us make decisions when it comes to a supervised classification because we can take our point and depending, we can take any sort of input data and find some way to put it on a plane, like this and then, just find what the decision boundary is and then, we can plot this, and so, with a lot of classification algorithms, what they try to do, is they try to find this boundary, is what they’re all concerned about, because once you have this boundary, then, if you get a new point, then it’s fairly easy to classify.

You can say, “Well, I want this portion to be “part of the boundary is blue. “This part of the boundary is red. “This part of the boundary is green.” so, if you get points that are inside one of these boundaries, you just give it a label of what’s around there and so, this is what supervised classification algorithms try to find, some kind of boundary. It might not be the case that you have, such a nice, two dimensional data, like this but there are ways that you can fit it onto a plane.

I’m not gonna get into, too much but, here’s a question. So, what if my point was like, right over here. Then it’s not so obvious as to if it is a blue circle or a red X and so, you know, there’s some inherent there’s some confidence value or some measure that says that, “I think that this is a blue value “with this confidence or with this probability” and so, even the points that we we’re classifying, here they did.

Even though that it seemed kind of obvious, that around them, there are blue circles, there is some inherent uncertainty about this and it turns out that, well, for each of these points, there is a chance that it could have been a red X or it could have been a green triangle, but that chance was very, very low and we only assigned it the label that has the maximum chance. So, it’s not necessarily the case that this must be a blue circle, instead, we say, that this was a high probability a blue circle and so, you can’t be 100% certain.

If you look at this point, over here, it becomes clear that this could be a red X or this could be a blue circle. It just kind of depends on what this boundary specifically looks like, but given new inputs I want to be able to, like give them one of these labels, here.

So, and this is where I’m going to stop, right here and I’ll do a quick recap. So, with supervised classification, it is a subfield of machine learning and it’s all, where the problem that we’re trying to solve is, we have these labels and our input data and we want to, now that we’ve seen our data, we want to, given some new input, we want to give it a label based on the labels that we already have and that is kind of the problem of supervised classification.

We want to fit or label some new input based on what we have already seen before and so, I kind of gave this example of, like, if we had red X’s, green triangles and blue circles, given the new point, how would you figure out if it is one of these categories and we use these things called decision boundaries to try to get that and figure it out. So, that is supervised classification.

There are tons and tons of algorithms that can do this. Some of them work better than others. It all depends on what kind of data you’re looking at but the point is that they are lots of different algorithms for this, and so you can take a look around and see if there’s one that you want to know more about but anyway, this is a problem of supervised classification.

Transcript 4

Hello, everybody. My name is Mohit Deshpande, and in this video, I want to give you kind of a, I want to define this problem called image classification, and I want to talk to you about some of the challenges that we can encounter with image classification as well as, you know, some of, get some definitions kind of out of the way and sort of more concretely discuss image classification.

So first of all, I should define what image classification is and so what we’re trying to do with image classification is assign labels to an input image, to an input image. So this kind of fits the scheme of just supervised classification in general, is we’re trying to given some new input, we want to assign some labels to it. There’s some specific, there’s some challenges specific to images that we have to talk about, but before we really get into this, I want to remind you that images are just, images consist of pixels, and so what we’re trying to do here is just remember again that the computer just sees like this grid of, the computer just sees this grid of pixels and so what we’re trying to do with this is we’re trying to give this labels like “bird” for example.

Suppose I have an image of a bird or something over here or something like that. I have some picture of a bird and so what I want to do is give this to my classifier and my classifier will tell me that this, the label that works well with this, the label that closely can be tied to this image is “bird”. And so that’s the goal of image classification and we’re trying to add some higher level meaning to this image. In fact, what we’re trying to do is we’re trying to determine what is inside of an image and that’s what these labels are.

These labels tell us what is inside of the image. Not just random labels, but for image classification we want to know, we’re particularly interested as to what is inside of this image, but this isn’t an easy problem by any means. And so there’s some challenges that are specific to, there’s some challenges, I misspelled that. I forgot about the “n”, there should be an “n” in there.

Challenges specific to image classification so I just want to talk about a couple of them. We won’t get to all of them, but one particular challenge here is scaling and that is if I have a picture of a bird, if I have a picture of a small bird as opposed to when I feed my classifier the same picture, but it’s now maybe doubled in size, then my classifier should be robust to this. I should be able to take an image, and there shouldn’t be any dependence on size. If I give you a picture of a small bird, I can give you a picture of a large bird and it should be able to figure out either which bird that is or that this is a bird, right? If I give this an image of some object or something.

So suppose my class, I should probably define some of these class labels. So suppose my class labels, I don’t know, suppose my class labels are something like “bird”, “cat”, or “dog”. These are just like some example class labels, for example. So if I give it a picture of one of these things, and depending on if it’s a big dog or a small dog, it should be able to identify this as a dog. If I give it a picture of a small cat or a large cat, it should still be able to identify this as a cat. And so there’s challenges with scaling.

There’s this other challenge called occlusion. And occlusion is basically when part of the image is hidden so part of image is hidden or behind another, behind something so that would be like if I had a picture of a bird and maybe like a branch or something is in the way and it’s covering up this portion here. We want our classifier to be robust to things like occlusion this is a pretty big challenge with occlusion because depending on what part you see, we have to make our classifier robust to this.

So occlusion is like a part of an image and it’s hidden behind something else like for example, like this tree branch that’s blocking half of my bird or something. I still want to classify this as a bird so that’s kind of the challenge of occlusion. I guess we can do one more. Another good one is illumination. I can’t spell today, I guess. Illumination is what I mean, and illumination is lighting. Illumination is basically lighting so depending on my lighting conditions of whenever the input image was taken, I still want to be robust to that kind of thing.

I don’t want my image to be classified poorly because my cat is standing in sunlight or something like that. Or my cat is in darkness or if my bird is, it’s a cloudy day or something like that, I don’t want that. I want my classifier to also be robust to illumination and there’s so many more things, so many more challenges with image classification and it makes it kind of difficult and so there’s work going around, there’s still research going into finding ways to be more robust to some of these challenges.

And sort of build a really good classifier, we need to take a data driven approach, so data driven, data driven approach and what I mean by that is we basically give our AI tons of labeled examples so for example, if we were doing this thing that differentiates between these three classes, we would give our AI tons of images of birds and tell them that, tell our AI that this is a bird. We give our AI tons of pictures of cats and say, “This is a cat”. We give our AI tons of pictures of dogs and we say, “This is a dog”.

Alright, so with data driven, we want to give our AI labeled example images and these labeled images are also commonly called ground truths. This labeled example is commonly called ground truth because when we go to evaluate it, we actually compare what the classifier thinks this is to what the actual value or what the actual truth of this image, the truth of what the label is on the image and we call it ground truth so we compare the prediction to ground truth and say how well is our classifier performing.

So yeah, we want this to be data driven so we take this approach by giving our AI lots of labeled example images and then it can learn some features off of that, but if you want to take this approach, however, you’ll need, you can’t just give it two images of a bird or two of each and be done with it, right?

The more good training data that you have, the more high quality training data that you give your AI, the more examples that you give your AI, the better it will be to discriminate between bird, cat, dog. To make that distinction between these classes, you want to give lots of high quality examples to your AI.

And I’m going to talk a little bit about this a bit more, but when we collect this data set, this data set is actually something you have to collect yourself. There’s tons of image classification data sets online. I mean, there’s ImageNet has a few million images across tons of different classes. There’s much smaller data sets, of course. There’s the C4-10 data set that has 10 different images. I think it’s maybe 60,000 images, but the point is lots of good quality training data is always preferable to some super complicated classification algorithm.

So that kind of illustrates that with image classification we want this to be data driven. There’s no way to hard code this for every bird or for every cat or for dog. Hard coding would not be a good approach so we’re taking the more data driven approach by giving our classifier lots of examples with labels on them so it can learn what a bird looks like and what a cat looks like, and so on.

So that’s where I’m going to stop right here and I’m just going to do a recap real quick. So with image classification, we want to give labels to an input image based on some set of labels that we already have. And so given suppose I have three labels like “bird”, “cat” and “dog or something and so given a new input image, I want to say whether it’s a bird, a cat, or a dog, where I want to assign that label and so suppose, so computers only see, the computers only see the image as pixels so we have to find some way to build a classifier out of just given these pixel values, and lots of challenges that are with that.

Like I mentioned scaling, that’s if you have a big bird or a small bird, you want to be able to still say that it’s a bird. There’s occlusion. If I have a tree branch in the way, or something like that, I still want to classify this as a bird. There’s illumination, if I have like a dog, it’s standing in direct sunlight as opposed to a dog in a darker room or something. I still want to classify that as a dog. And kind of, that also gets into other challenges like what’s going on in the background. You want a very sterile background when you’re getting training data. You don’t want a lot of background clutter because that could mess up your classifier. It might learn the wrong thing to associate with your label that you’re trying to give.

But anyway, moving on, so a good approach to doing this is the data-driven approach and that is we give our AI lots of labeled example images. We give it lots of images of birds and tell it that this is what a bird looks like. We give it lots of images of cats and we say, “This is what a cat looks like” and so forth for a dog and for any other classes that you might have. But we give these example images and it will learn some representation of what a bird is and what a cat is and what a dog is, and given that, it can generalize and when you have a new input image, it will do it’s function and that is to label it as one of these labels, or give it one of these labels, I should say.

So I’m going to stop right here and what we’re going to do in the next video, I want to talk probably the simplest kind of image classifier that’s called the nearest neighbors classifier so I’m going to talk about that in the next video.

Interested in continuing? Check out the full Build Sarah – An Image Classification AI course. You can also check out our Machine Learning Mini-Degree and Python Computer Vision Mini-Degree for more Python development skills.