Header Ad Section

Introduction to Machine Learning

























What is Machine Learning?

Humans learn from their past experiences while Machines follow the instructions given by humans. But we can train/learn machines to follow instructions to learn from past data using past experiences which are called Machine Learning.


Machine Learning Model

If we need to keep the accuracy of our Machine Learning Model we need two most important things. They are:

  1. More Data
  2. Better Model   

Application of Machine Learning

  • Health Care
  • Sentiment Analysis (Like, Dislike)
  • Fraud Detection
  • E-Commerce


Basically, Machine Learning can be mainly divided into two categories. They are supervised learning and unsupervised learning.

Supervised Learning

In supervised learning, before the analysis is done it builds  the model and then we can apply the algorithm to estimate the parameters of the model. Classification, Decision Tree, Bayesian Classification, Neural Networks, Association Rule Mining are common examples of supervised learning.

Classification

Classification is a supervised learning technique. It maps the data into predefined groups. It used to develop a model that can classify the population of records at a large level. The classification algorithm requires that the classes be defined based on the data attribute value. The classifier training algorithm uses these pre-defined examples to determine the set of parameters required for proper discrimination.


Decision Tree

A Decision Tree is a flow chart-like tree structure, where each node denotes a test on an attribute value, each branch represents the result of the test, and tree leaves represent classes. The drive model can be represented in different forms such as classification (if-then) rules, decision tree, mathematical formula, or neural networks Decision tree can easily be converted to a classification tree. Decision trees are simple to understand and provide good results even with small data. Decision tree algorithms can be used for classification in a wide range of application areas such as manufacturing, financial analysis, fraud detection, and education. ID3, CART, J48, NB Tree REP Tree are some commonly used data mining algorithms.


Bayesian Classification

Bayesian Classifier is a statistical classifier that could be used to predict class membership probabilities indicating that whether the tuple belongs to the particular class or not. Bayesian Classification is based on the Bayes theorem.

Bayes theorem is as follows:

P(H/X) = P(X/H) P(H) / P(X)


Neural Networks

Neural Networks is a collection of neurons such as processing units with the weighted connection between the units. It composes of many elements, called nodes which are collected in between. The collection between two nodes is weighted and by the adjustment of this weight, the training of the network is performed. A classification model can be represented in different forms such as Neural Networks and Decision Tree. There are many advantages of neural networks such as adaptive learning ability, self-organization, real-time operation, and insensitivity to noise. Neural networks are used to identifying patterns or trends in data and well suited for prediction or forecasting needs. The most common neural network algorithms are Black Propagation, NN Supervised Learning, and Radial Base Function Network. 


Unsupervised Learning

In unsupervised learning, we do not create a model before the analysis is done, instead, we simply apply the algorithm directly to the dataset and observe the result. Then a model may be created according to the basis of the obtained results. Clustering is one of the examples of unsupervised learning. 

Clustering

Clustering is the process of grouping data into classes, so that objects within a cluster similarity is high in comparison to one another, but are very dissimilar to objects in other clusters. Dissimilarities have been observed on the basis of attribute value describing the objects often distance used. Clustering has been frequently used in data mining applications for discovering patterns in huge datasets. There are many clustering techniques like Partitioning methods (K-means, K-medoids), Hierarchical methods(CURE, CHAMELEON), Density-based methods(DBSCN & OPTICS), Grid-based methods(STING, CLIQUE) and Model based methods(EM algorithm).


Other Data Mining Techniques

Other Data Mining techniques are Association Rule Mining, Prediction, Time Series Analysis and Sequential Patterns.

Association Rule Mining

Association Rule Mining is the discovery of association relationships or corrections among a set of items. Association and correction are used to find the frequent itemset among large data sets. Association rule for a given dataset is very large and they are generally in value.  The main task of association rule mining is to find sets of the binary variables that co-occur together frequently in the transaction database. Association rule holds many algorithms like Apriori, CDA, DDA, interestingness measure. Association rules are if-then statements that find uncover relationships between unrelated data in the relational database. The most common association rules are multilevel association rules, multi-dimensional association rules, and quantitative association rules.

Prediction

Prediction is a data mining technique used to identify the relationship between independent variables and the relationship between the dependent and independent variables. The regression technique can be used to generate a model for prediction. Regression analysis can be used to model the relationship between one or more independent variables and dependent variables. Prediction techniques can be used to predict the possible values of some missing data and the value distribution of certain attributes in a set of objects. Most common Regression techniques such as Linear Regression, Nonlinear Regression, Multivariate Nonlinear Regression.

Time Series Analysis

Time Series Analysis is a sequence of data points, measured typically at successive times speed at uniform time intervals such as stock prices, currency exchange rates, and the volume of product sales, collected over monotonically increasing time. Rule induction algorithm such as Version Space, AQ15, C4.5 are commonly used in time series data mining applications.

Sequential Patterns

Sequential Patterns is one of the data mining techniques that seek to discover similar patterns in data transactions over a business period. The uncovered patterns are used for further business analysis to recognize relationships among data.


Post a Comment

0Comments