What Brands Soared in Popularity in the Pandemic

Oh what a wild and wacky ride 2020 has been! The memes about this year being “the worst year ever” seem to never end — just like the year itself. However, for as disruptive, disjointed, disagreeable…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Machine Learning 101

The idea of this blog is to cut jargon out as possible I can. To give an overview of Machine Learning and the reason we opt for it.

Let’s jump in

A field of study that gives computers the ability to learn without being explicitly programmed to do so.

Enough with the definition, you hear this almost everywhere.

Let’s understand how it differs from traditional programming once for all.

For instance, the spam filter thing in our mail works on the principle of Machine learning. Given the examples of spam emails (by the users) and examples of a non-spam email (regular mails), the spam filter programmed with machine learning is able to flag out forthcoming spam mails.

Alright, you can ask me, ‘ Ash, I can do this with traditional programming too’.

Absolutely you can, but think of something which eases up your job? Do you still go for the hard road?

First of all, we have to find a way to notify the program how do spam mail looks like. In our emails, we might have words or phrases ( such as 4U, credit card, free, amazing, etc..) that tend to come up a lot in the Subject section of the mail. Perhaps even some other odd terms in the sender’s name, the body of the mail, etc…

Then we will write an algorithm (with if-else) for each pattern that we noticed something like,

Alright don’t judge my programming skills, I tried to make it easy to understand. Likewise for the amount of patterns were said to notify Spam will go under a process similar to this.

Adding up too many odd terms or patterns in our mailing list, the program will likely become a long list of complex rules.

Alright, what if spammers noticed all their emails containing 4U are blocked? They found out how smartass you are. They can change a bit of convention from 4U to For U instead, which bypasses the program you did.

The code we have written before won’t be having even a 0.0001% chance of detecting this change in words. If the spammers keep working around your spam filter, you will need to keep writing new rules forever.

Machine Learning is smarter than those spammers, to be honest, it automatically learns which words and phrases are meant to be spam or not but learning the unusual frequent patterns of words in spam emails and non-spam emails.

For instance, machine learning is smart enough to understand spam mail contains words like (4U, credit card, free, amazing, etc..) are flagged spam by you, so if mail’s containing these kinda words would be considered as Spams.

That’s why at times normal mails would end up in the Spam section. This makes us understand machine learning can’t be 100% accurate, they do some mistakes.

But they ease up the process that’s all we want, little loss is manageable, they learn better with more underlying patterns you provide to them which means more data.

from Hands-on ML book

Then at times when the spammer became smart changing the conventions, Machine learning is quite clever to notice 4U for instance, become unusually frequent in spam flagged by users and it starts flagging them without human intervention.

So, in the previous sections, we came across a basic intuition of What ML is. Now we will be peeking in to know about Data.

We haven’t talked about data yet which is really important for any Machine Learning Problem. Always remember Machine Learning is just a puppet it stays idle and the puppeteer (data) does the heavy lifting.

In all Machine Learning problem’s, there are three kinds of data :

Before into it, imagine a scenario of How you prepare for your exams. Keep this in mind, might help us to understand these splits better.

Our machine learning model is initially fitted into a training set. We always train our model on the training set, this is where our model learns.

To make it simple, before appearing for an exam we usually prepare ourselves. This includes study materials, mock tests, etc..

The learning materials (or) course materials we consume for preparing for the exam can be called a training set. Here you are getting trained by learning all the kinds of stuff needed for the exam, this would be the initial step of your preparation before the exam.

This will be often a subset of our whole dataset, in practice, people consider 75% of their data as a training set. With this 75% of data, we will train our model and prepare it to the best.

We have learned something but we gotta evaluate ourselves and figure out how well we are prepared isn’t it?

In Machine Learning, we use a validation set to tune our model. In other fine words, we adjust our model here and evaluate it to figure out are they doing any fine? Did they learn well?

This is considered to be an important step in machine learning, which drives us into an important topic called Generalization. To say an ML model doing good or working, it should be able to generalize well.

Our model should be able to perform well on the data which it hasn’t seen before. It’s like the mock tests you undergo before appearing for the exam. In this period, you solve different problems which you haven’t come across while you were learning.

By the time we solve problems, we can able to evaluate where do we stand. If our performance was bad then we go back to consume some more materials and back again to the mock test. This is an iterative process indeed.

We use the validation set, as our mock tests to see how well our model has learned if it doesn’t we go to our training set to make some changes, train them all again, and evaluate the validation set again.

The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters is called a validation set.

For now, think of hyperparameters as an oven where you tune the oven and cook foods. Will come back to this in later blogs.

This is the set that remains untouched till the end of the Machine Learning project workflow. After training and tuning your data, this is where you will evaluate your model and compare the results.

The sample of data was used to provide an unbiased evaluation of a final model fit on the training dataset. It is a set of examples used to assess performance.

When you estimate the generalization error using the test set, your estimate will be too optimistic, and you will launch a system that will not perform well as we expected. This is called data snooping bias.

In analogy, the test set is gonna be your exam itself. Where you appear and perform whatever you have learned and practiced during the preparation.

This is a basic intro to how we will be splitting our data, I missed out on certain things intentionally and explain them during another series of blogs.

Things like cross-validation, hyperparameter tuning, over and underfitting, etc… will be coming up. Hope you like it! Leave feedback. The goal is to make things simpler that’s what I am trying to do with these blogs.

Have a great day! Keep Learning 🤖

Also, make sure to join our AI Community:

Add a comment

Related posts:

The Visitor

Though looking up at the clear night sky and my memories the same for a moment it seems that I remember whence I came But these visions are far between I feel desperate, gasping for time and as I…

Data Versioning for Customer Reports Using DataBricks and lakeFS

Similarweb offers an AI-based market intelligence platform that helps monitor web and mobile app traffic. The company gives a global multi-device market intelligence to understand, track, and grow…

Is Anxiety and Depression the same?

Anxiety and depression are two separate conditions, although they can have a similar mechanism in the brain. Anxiety can manifest in different ways, and the overlap, With depression can make it…