Basic Understanding of KNN Algorithm

Nowadays, with Machine Learning and Artificial Intelligence being applied to more and more industries, people have various needs when they looked at different algorithms. K Nearest Neighbors is one of the most popular among these models, and in this post, I will discuss what is KNN algorithm and what situation this model works the best.

Algorithm Overview

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems.

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

Step-1: Select the number K of the neighbors

Step-2: Calculate the Euclidean distance of K number of neighbors

Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

Step-4: Among these k neighbors, count the number of the data points in each category.

Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.

Step-6: Our model is ready.

What is k value and how to determine it?

If k is too small, then the outliers and noise data would make the result not accurate. If k is too large, it will lead to an oversimplified decision boundary. When k increases, the bias will increase. When k decrease, the variance will increase. We can use the Elbow method to determine the optimal k. Please see this link for a comprehensive tutorial of the Elbow method. We can also try multiple k values based on our choice to determine the best fit for the model.

Advantage of KNN

  1. This algorithm is simple, easy to implement, and no need to estimate parameters.
  2. Suitable for classifying events, which can be broadly applied in real-world problems.
  3. It can be used for both classification and regression.

Disadvantage of KNN

  1. It performs much slower for large datasets.
  2. It is sensitive to outliers and missing data.

KNN in Industry Application

Since KNN can work for both classification and regression problems, KNN can be widely used in real-world situations. For example, we can use KNN to classify customers' financial situations and help banks and loan institutions to better understand and predict the risk behind each customer. Banks can also use KNN to build customers’ profiles for future analysis and detect abnormal transactions.

KNN can also be used in the healthcare industry when we need to predict whether a patient would have a higher chance of having a specific disease or cancer when one symptom is shown.

However, we need to keep in mind that KNN may not work well with a dataset that is too large, and in those situations, we need to combine the result with other models and evaluate the final outcome together to make a better prediction.