Analytics with K Nearest Neighbor Classification
K Nearest Neighbor
Classification
K Nearest Neighbor Classification
is a pattern recognition algorithm. It is a non-parametric method used for
classification and regression. In both
cases, the input consists of the K closest examples.
we can consider each of the
characteristics in our set as a different dimension in some space, and take the
value an observation has for this characteristic to be its coordinate in that
dimension, so getting a set of points in space. We can then consider the
similarity of two points to be the distance between them in this space under
some appropriate metric. The way in which the algorithm decides which of the
points from the training set are similar enough to be considered when choosing
the class to predict for a new observation is to pick the k closest data points
to the new observation, and to take the most common class among these, thus
called the k Nearest Neighbors algorithm.
The Algorithm
The algorithm can be summarized as:
1. A positive integer k is specified, along with a new sample
2. We select the k entries in our database which are closest to the new sample
3. We find the most common classification of these entries
4. This is the classification we give to the new sample.
The closeness can be identified by
various distance measurements. Hence the distance method we choose effects the
final results.
Example:
Let us Consider the following data
concerning credit default. Age and Loan are two numerical variables
(predictors) and Default is the target.
|
D
= Sqrt[(48-33)^2 + (142000-150000)^2] = 8000.01 >> Default=Y
|
With
K=3, there are two Default=Y and one Default=N out of three closest
neighbors. The prediction for the unknown case is again Default=Y.
|
K-Nearest Neighbor algorithm in case of high number of
dimensions and low number of training samples, "nearest" neighbor
might be very far and in high dimensions "nearest" becomes
meaningless. It is an easy to understand algorithm and handling of missing
values is effective (restrict distance calculation to subspace).
Thanks
and Regards,
Amoeba
Technologies Core Team | info@amoebatechno.com www.amoebatechno.com | 24 x 7 Support No: +91-8886516000
Facebook: https://www.facebook.com/AmoebaTechnologies
0 comments :