Reputation: 525
I have a dataset that I want to use for building a Decision Tree in R studio. I have quite a few factors which are empty. I want to change all Factors that are empty in the dataset to "No Data", I have over 100 of these so I don't want to do them one by one, I'd rather be able to change all of them at once.
Example of data (Please note that these are all factors, I know that when it's put into R they are numerics but I don't know how to show factors in a replicated way as I read in the data from a csv):
Outcome=c(1,1,1,0,0,0)
VarA=c(1,1,NA,0,0,NA)
VarB=c(0,NA,1,1,NA,0)
VarC=c(0,NA,1,1,NA,0)
VarD=c(0,1,NA,0,0,0)
VarE=c(0,NA,1,1,NA,NA)
VarF=c(NA,NA,0,1,0,0)
VarG=c(0,NA,1,1,NA,0)
df=as.data.frame(cbind(Outcome, VarA, VarB,VarC,VarD,VarE,VarF,VarG))
Upvotes: 1
Views: 133
Reputation: 887981
When we have factor
columns and wanted to replace one of the values with a new value, either call the factor
again or add the new value as one of the levels
of the factor before doing the change. Assuming that we have to recode for variables other than the first column, loop through the columns with lapply
, add 'No Data' as one of the levels
and then replace
the NA elements with "No Data", and finally assign the list
output to the columns of interest
df[-1] <- lapply(df[-1], function(x) {
levels(x) <- c(levels(x), "No Data")
replace(x, is.na(x), "No Data")
})
Upvotes: 2