Reputation: 93
I am performing a knn analysis of some data. I have both categorical (with more than 2 factors) and continuous data. I found a package that accounts for this situation (knncat) but there's very little documentation explaining how it actually works.
I wish to use cross-validation (which I believe can be done by simply providing no training data) and I've run into an issue. I do not know how this packages goes about normalising the data. I don't know if I should normalise the numeric data before using it or if I should just leave it as is.
Does anyone know how knncat handles this? Or is anyone able to recommend a better method or package for handling KNN with a mix of categorical and numeric data?
Upvotes: 0
Views: 311
Reputation: 2469
The best way to check what a function is doing is to look inside the body, you can do that with:
getAnywhere(knncat)
As far as I can see there is no scaling or centering the data there.
As to the second question, the way to deal with categorical data is to create dummy variables (knncat is in fact doing that). But you can create the dummy variables yourself.
For example,
data$cat.var.dummy <- as.integer(as.factor(data$cat.var))
Or you can use the package Caret which has methods to do that as well, dummyVars
Upvotes: 1