What is machine learning? I could bore you with textbook definitions. Instead, let me use a familiar example.

A few days ago, I was teaching a child how to count. This is what transpired:

**Child: **1, 2, 3, 4, 5, 3, 8…

**Me: **Stop. 1, 2, 3, 4, 5 is correct, but what comes after 5?

**Child: **3.

**Me: **No, 6 comes after 5. Can you say six?

**Child: **Six.

**Me: **Ok, good. Repeat after me: 1, 2, 3, 4, 5, 6….

**Child: **1, 2, 3, 4, 5, 3….

We repeated this exchange several times. Eventually, the child *learned *that 6, not 3, comes after 5.

**What is machine learning?**

So, what is machine learning, and what does it have in common with counting?

Let me introduce some terminology before I offer a very simplified explanation of machine learning.

First, let’s call the child’s sequences of numbers the hypotheses. For example, the (incorrect) sequence {1, 2, 3, 4, 5, 3, 8…} is one **hypothesis**. Second, let’s call the correct sequence {1, 2, 3, 4, 5, 6, 7, 8…} the **target**. Lastly, let’s call the difference between the hypothesis and the target the **cost function**.

Now, observe that:

- I knew the correct sequence (the target), but the child did not.
- Initially, the child’s hypotheses were incorrect.
- By iteratively correcting the error between the child’s hypothesis and the target, the child
*learned*how to count.

This scenario loosely illustrates a central idea in machine learning: the concept of iteratively minimizing a **cost function–**the difference between the target and the hypothesis. In the field of ML known as supervised learning, an algorithm attempts to determine the correct hypothesis that relates known training data to known labels. The machine *learns* if the relationship can then be generalized to unlabelled data, with an acceptable level of accuracy.

**digit classification**

Here is a real-world problem utilizing machine learning: the MNIST digit classification problem. The MNIST database consists of 70,000 handwritten digits as shown below. The goal is to develop an algorithm that learns each digit; show the machine an image of handwritten figure eight, and the machine should correctly classify the digit as an 8. This “hello, world” algorithm is typically one of the first problems attempted using advanced ML techniques.

The digit classification task, reminiscent of teaching a young child how to count, is a machine learning application of computer vision. Other basic computer vision applications include cheque readers, license plate readers, and bar code scanners. More advanced applications include autonomous vehicle navigation, tumour/disease detection in medical scans, face/identity detection in security systems or Facebook, and emotion detection by robots.

Initially, the child in our example had an accuracy of about 70% (5/7 correct in the initial hypothesis), but eventually scored 100% accuracy. Similarly, using a machine learning technique called softmax regression, I correctly classified 93.9% of the digits in the MNIST database. While this may sound impressive, it is suboptimal, as far as ML accuracy is concerned. Indeed, most recent MNIST digit classification attempts achieve greater than 99.5% accuracy. In future posts, we will apply more robust machine learning techniques to the MNIST problem and compare our accuracies.

To conclude, machine learning and human learning have some similarities. The next time you are teaching a child their abc’s or their 123’s, you may unwittingly use a cost function, one of the central concepts in machine learning.