Reputation: 952
I'm working on a generic code that loops through every column in a data frame and converts the column as a "factor" class if the number of unique values in that column are less than, say 32. My current progress is:
dfr <- data.frame(x<-floor(runif(40,0,5)), y<-rnorm(40))
colnames(dfr)<-c('y','z')
In this example, I want variable 'y' to be converted into a factor variable. So I do:
sapply(dfr, function(x) ifelse(length(unique(x)) <= 32, x <- as.factor(x), x))
But, after doing this I'm unable to convert the class for 'y'
sapply(dfr, class)
y z
"numeric" "numeric"
Can anyone give guidance as to where I'm going wrong. I didn't imagine doing this action to be this onerous.
Thanks in advance.
Upvotes: 1
Views: 857
Reputation: 115435
ifelse
will return a vector the same length as the test (not what you want), use if(){}else{}
instead
You have more than 40 unique values in y
, so your function will not coerce it to factor.
sapply will coerce the results to a matrix, which will force all variables to be the same "class"
What you want to do is use lapply
, and then replace the contents of the original.
dfr[] <- lapply(dfr, function(x) if(length(unique(x)) <=32) { as.factor(x)} else{x})
# It works!
str(dfr)
# 'data.frame': 40 obs. of 2 variables:
# $ y: Factor w/ 5 levels "0","1","2","3",..: 2 1 2 1 5 3 5 1 5 1 ...
# $ z: num 0.9036 0.2909 -0.9027 -0.4588 -0.0495 ...
Upvotes: 4