Reputation: 1040
I was wondering if anyone have a solution to the following problem when cleaning survey data in R.
Let’s say that a survey has Q1 “What is your gender” : Male, Female, Prefer not to say. In the survey, no one selects “Prefer not to say”, so that when I ran the frequency I only see:
Q1 Male :8, Female :8.
Is there a way to code in “Prefer not to say” into Q1 so that when I run the frequency I see:
Q1 Male : 8, Female: 8, Prefer not to say: 0.
Here is some sample data & code:
dat_in<-read_table2("ID Gender
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2
9 2
10 2
11 2
12 2
13 1
14 2
15 1
16 2
")
data_cat <- dat_in %>% mutate_if(is.numeric,as.character) %>% mutate(across(matches("Gender"), ~fct_recode(., "Female" = "1","Male"="2")))
lapply(select_if(data_cat, is.factor),
function(x) {
df = data.frame(table(x))
return(df)
})
Upvotes: 0
Views: 86
Reputation: 887048
Change it to factor
with levels
specified so that even if there is no element, it returns frequency count of 0
table(factor(dat_in$Gender, levels = c("Male", "Female", "Prefer not to say")))
-output
Male Female Prefer not to say
8 8 0
If there are many variables, that are character/factor class, loop over the columns, add the "Prefer not to say" as a new level
i1 <- sapply(dat_in, function(x) is.character(x)|is.factor(x))
dat_in[i1] <- lapply(dat_in[i1], function(x) {
if(is.factor(x)) {
levels(x) <- c(levels(x), "Prefer not to say")
} else {
x <- factor(x, levels = c(unique(x), "Prefer not to say"))
}
x })
Or if we are using tidyverse
, then this can be done with fct_expand
from forcats
library(dplyr)
library(forcats)
dat_in <- dat_in %>%
mutate(across(where(~ is.factor(.)|is.character(.)), ~
fct_expand(., "Prefer not to say")))
Upvotes: 2