Machine Learning

Prof. H. M. Tambol, Assistant Professor,

hmtamboli@coe.sveri.ac.in

CSE Department, SVERI’s COE, Pandharpur

Machine Learning is the most popular technique of predicting the future or classifying information to help people in making necessary decisions. Machine Learning algorithms are trained over different examples through which they learn from past experiences and also analyze the historical data. Machine learning has become a central part of our life – as consumers, customers, and hopefullyas researchers and practitioners.

The world today is evolving and also the needs and requirements of people are also evolving. Furthermore, we are witnessing a fourth industrial revolution of data. In order to derive meaningful insights from this data and learn from the way in which people and the system interface with the data, we need computational algorithms that can churn the data and provide us with results that would benefit us in various ways. Machinelearning is simply a study of computer algorithms that actually improve automaticallythrough experience

Types of ML Techniques:

Supervised Learning:

In supervised learning, the machine learns from the labelled data. In other words, we have input and output variables and we only need to map a function between the two. The term “supervised learning” stems from the impression that an algorithm learns from a dataset (training). Here, the input is an independent variable, and the output is a dependent variable. The goal is to generate a mapping function that is accurate enough so that the algorithm can predict the output when we feed new input. This is an iterative process. Each time an algorithm makes a prediction, we need to check its performance. If it is not ideal, we have to keep repeating the process.

Let’s take this example to understand supervised learning in a better way.

For example – if we take a fruit basket, the machine will first classify the fruit with its shape and colour and would confirm the fruit name. If one searches for grapes, then machine learning from its training data (basket containing fruits) will use the prior knowledge.

It will then apply the knowledge to test data and will then provide you with the results.

Now, supervised learning can again be divided into two categories:

1. Regression

2. 2. Classification

1. Regression

Since Regression is a supervised learning algorithm, there will be an input variable as well as an output variable and the point to keep in mind is that the output variable is a continuous numerical, i.e. the dependent variable is a continuous numerical.

Let’s take this example to understand regression:

Let’s say you have two variables, “Number of hours studied” & “Number of marks scored”. Here we want to understand how does the number of marks scored by a student change with the number of hours studied by the student, i.e. “Marks scored” is the dependent variable and “Hours studied” is the independent variable.

Based on this data, I now want to know: “How many hours should a student learn to get 60 points?” So, this is where regression techniques come in. The regression model would understand that there is an increment of 10 marks for every extra hour studied and to score 60 marks the student has to study for 6 hours.

You need to note that “marks scored” is the dependent variable and it is a continuous numerical.

2. Classification

Classification algorithms also need both the input data as well as the output data. Here, the output variable or the dependent variable should be categorical in nature.

Let’s take this example to understand classification.

Consider these three variables, “Person has lung cancer or not”, “Weight of the person”, “Number of cigarettes smoked in a day”. Here, we want to understand does the person have lung cancer based on the weight of the person and the number of cigarettes he/she smokes in a day, i.e. “Having lung cancer” is the dependent variable and “weight” and “No of cigarettes smoked” are the independent variables.

Again, you need to note here that “Having lung cancer” is a categorical variable, which has two categories, “yes” and “No”. Based on the independent variables, we classify whether the person has lung cancer or not.

2.Unsupervised Learning

In unsupervised Learning the machine learns from unlabeled data, i.e. the result for the input data is not known beforehand. Here, the algorithm tries to determine the underlying structure of the data. In unsupervised learning, the training of the machine is done using the information which is neither classified nor labelled. The machine learning algorithm acts on information without guidance. It groups unsorted information according to similarities, patterns, and differences without any prior training or supervision.

So, suppose if the machine is provided with the image of a pen and pencil and its information is not available then it can be categorized according to the similarities, patterns, and differences.

Reinforcement Learning

In reinforcement learning, the algorithm learns through a system of rewards and punishment and the goal here is to maximize the total reward. Reinforcement learning is a very interesting kind of learning. There’s no answer key which can tell what’s right. But, the reinforcement learning agent still decides how to act to perform its task. This machine learning technique is all about taking actions that are suitable and maximize the reward in a particular situation. It is when the learner receives rewards and punishments for their actions.

APPLICATIONS

1. Image Recognition

It is one of the most common machine learning applications. Here we could do clustering on image pixels so that you could discover different regions in the image and then you could do some segmentation based on that different region so for example here it have a picture of a beach scene and then you are able to figure out the clouds and the sand and the sea and the tree from the image. So that allows you to make more sense out of the image right.

2. Medical Diagnosis

ML provides methods, techniques, and tools that can help in solving diagnostic and prognostic problems in a variety of medical domains. It is being used for the analysis of the importance of clinical parameters and of their combinations for prognosis, e.g. prediction of disease progression, for the extraction of medical knowledge for outcomes research, for therapy planning and support, and for overall patient management. ML is also being used for data analysis, such as detection of regularities in the data by appropriately dealing with imperfect data, interpretation of continuous data used in the Intensive Care Unit, and for intelligent alarming resulting in effective and efficient monitoring.

3. Statistical Arbitrage

In finance, statistical arbitrage refers to automated trading strategies that are typical of a short-term and involve a large number of securities. In such strategies, the user tries to implement a trading algorithm for a set of securities on the basis of quantities such as historical correlations and general economic variables. These measurements can be cast as a classification or estimation problem. The basic assumption is that prices will move towards a historical average.

4. Learning association

Learning association is the process of developing insights into various associations between products. A good example is how seemingly unrelated products may reveal an association to one another. When analyzed in relation to buying behaviours of customers.

5. Classification

Classification is a process of placing each individual from the population under study in many classes. This is identified as independent variables. Classification helps analysts to use measurements of an object to identify the category to which that object belong. To establish an efficient rule, analysts use data. Data consists of many examples of objects with their correct classification.

6. Mining Transaction

The most popular thing here is mining transactions. Transaction is a collection of items that are bought together right and so here is a little bit of terminology. A set or a subset of items is often called an item set in the Association rule mining community and so the first step that you have to do is find frequent item sets right.

And you can conclude that item set A, if it is frequent implies item set B if both A and AUB or frequent item sets right so A and B are subset so AUB is another subset so if both A and AUB are frequent item sets then you can say that item set A implies item set B right..

Limitations of Machine Learning

1. Data Acquisition

Machine Learning requires massive data sets to train on, and these should be inclusive/unbiased, and of good quality. There can also be times where they must wait for new data to be generated.

2. Time and Resources

ML needs enough time to let the algorithms learn and develop enough to fulfil their purpose with a considerable amount of accuracy and relevancy. It also needs massive resources to function. This can mean additional requirements of computer power for you.

3. High error-susceptibility

Machine Learning is autonomous but highly susceptible to errors. Suppose you train an algorithm with data sets small enough to not be inclusive. You end up with biased predictions coming from a biased training set. This leads to irrelevant advertisements being displayed to customers. In the case of ML, such blunders can set off a chain of errors that can go undetected for long periods of time. And when they do get noticed, it takes quite some time to recognize the source of the issue, and even longer to correct it.

References:

1. https://data-flair.training/blogs/advantages-and-disadvantages-of-machine-learning/

2. https://www.geeksforgeeks.org/machine-learning/

3. https://intellipaat.com/blog/tutorial/machine-learning-tutorial/

4. https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer

SVERI Blogs

Search This Blog