Reputation: 3
I'm working on an analysis of a survey, and most of the questions (105 questions out of 167) have a rank between 1 and 10, and 99999 when they are not filled in. I loaded the data set into R and made a data frame with these 105 questions. When I did this I saw that the the data types were not right. They were all dbl. So I first changed the datatype with (data set = survey):
survey <-data.frame(lapply(survey, as.character), stringsAsFactors=FALSE)
survey[survey == 99999] <- "No answer"
to be able to change the 99999 to "no answer" and then I used:
survey[] <- lapply(survey,factor)
to change it to factors. But the problem now is that the order of the factors or the ranks changed immediately after I applied the change to char. I think the reason for this is that for some questions no-one ranked 1 and when you change it to char it puts the rank = 10 in the first position when you, for example:
survey %>% group_by(v2_a)%>% summarize(count = n())
I know a way to reorder the levels separately, for example:
survey$v2_a <- factor(survey$v2_a, levels = c("1","2", "3", "4","5","6","7","8","9","10","No answer"))
survey$v2_b <- factor(survey$v2_b, levels = c("1","2", "3", "4","5","6","7","8","9","10","No answer"))
survey$v2_c <- factor(survey$v2_c, levels = c("1","2", "3", "4","5","6","7","8","9","10","No answer"))
...
But this requires a lot of work if you have to do it for 105 different questions. Does someone know a shorter way? I tried something like:
survey <- factor(survey, levels = c("1","2", "3", "4","5","6","7","8","9","10","No answer"))
But this definitely doesn't work.
Upvotes: 0
Views: 57
Reputation: 226182
Any additional arguments provided to lapply
will be added to the function arguments, so something like this
survey[] <- lapply(survey,factor,levels=c(1:10,"no answer"))
would probably work.
If you wanted to be more explicit about it you could do:
ffun <- function(x) return(factor(x,levels=c(1:10,"no answer")))
survey[] <- lapply(survey,ffun)
You could also try reading in your data with na.strings="9999"
(or whatever) in the first place, so that your no-answer cases got automatically converted to NA
.
Upvotes: 2