Consider NA as factor level for several variables

Question

I am trying to run a k-modes on a big categorical dataset.

There are several NA in each variable but I want to keep this information because for me it is meaningful.

K-modes does not work on a dataset with NAs, hence, I am looking for a fast approach to consider as factor level all the NA in all the variables.

I have read many questions but the answers are exclusively applicable in a name-by-name fashion.

Any suggestion using R?

mydf <- data.frame(a = factor(c("a", NA, NA)), b = factor(c("b", NA, NA)), c = factor(c("yo", NA, NA)))

Terru_theTerror · Accepted Answer

Try this:

mydf <- data.frame(a = factor(c("a", NA, NA)), b = factor(c("b", NA, NA)), c = factor(c("yo", NA, NA)))

From factor to character

mydf <- data.frame(lapply(mydf, as.character), stringsAsFactors=FALSE)

Substitution

mydf[is.na(mydf)]<-"Something"

Back to factor

mydf <- data.frame(lapply(mydf, as.character), stringsAsFactors=TRUE)

Your new factor

factor(mydf$a)
[1] a         Something Something
Levels: a Something

Answers (1)