Reputation: 3264
I am trying to run a k-modes on a big categorical dataset.
There are several NA in each variable but I want to keep this information because for me it is meaningful.
K-modes does not work on a dataset with NAs, hence, I am looking for a fast approach to consider as factor level all the NA in all the variables.
I have read many questions but the answers are exclusively applicable in a name-by-name fashion.
Any suggestion using R?
mydf <- data.frame(a = factor(c("a", NA, NA)), b = factor(c("b", NA, NA)), c = factor(c("yo", NA, NA)))
Upvotes: 0
Views: 199
Reputation: 5017
Try this:
mydf <- data.frame(a = factor(c("a", NA, NA)), b = factor(c("b", NA, NA)), c = factor(c("yo", NA, NA)))
From factor to character
mydf <- data.frame(lapply(mydf, as.character), stringsAsFactors=FALSE)
Substitution
mydf[is.na(mydf)]<-"Something"
Back to factor
mydf <- data.frame(lapply(mydf, as.character), stringsAsFactors=TRUE)
Your new factor
factor(mydf$a)
[1] a Something Something
Levels: a Something
Upvotes: 3