Hansy Kumaralal
Hansy Kumaralal

Reputation: 169

How can KNN algorithm be applied to categorical variables?

I am working on data set where most of the variables are categorical variables. some variables have even 5 categories. Is it possible to implement knn algorithm in a situation like this? If so, how can I proceed with these categorical variables? Do I have to normalize them? I am using R and it would be a help if someone could direct me to a source.

Upvotes: 2

Views: 7521

Answers (1)

Jandre Marais
Jandre Marais

Reputation: 328

Your first step would be to decide on a distance/dissimilarity function between your observations.

One option is to transform your categorical variables into dummy binary variables and then calculate the Jaccard distance between each row pair. Here is a simple tutorial for these steps.

Once you have a distance defined you can proceed with the KNN algorithm as usual. I'm not sure if there are any packages implementing this in R already or if you should program this yourself. Shouldn't be that complicated

Upvotes: 4

Related Questions