Reputation: 818
Survey data often contains multiple choice columns with entries separated by commas, for instance:
library("tidyverse")
my_survey <- tibble(
id = 1:5,
question.1 = 1:5,
question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)
It's desirable to have a function multiple_choice_tally
that will tally the unique responses for the question:
my_survey %>%
multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
response count
<chr> <int>
1 Bus 3
2 Walk 2
3 Cycle 3
What is the most efficient and flexible way to construct multiple_choice_tally
, without any hard coding.
Upvotes: 0
Views: 583
Reputation: 39154
We can use separate_rows
from the tidyr
package to expand the contents in question.2
. Since you are using tidyverse
, tidyr
has been already loaded with library("tidyverse")
and we don't have to load it again. my_survey2
is the final output.
my_survey2 <- my_survey %>%
separate_rows(question.2) %>%
count(question.2) %>%
rename(response = question.2, count = n)
my_survey2
# A tibble: 3 × 2
response count
<chr> <int>
1 Bus 3
2 Cycle 3
3 Walk 2
We can convert the above code into a function as follows.
multiple_choice_tally <- function(survey.data, question){
question <- enquo(question)
survey.data2 <- survey.data %>%
separate_rows(!!question) %>%
count(!!question) %>%
setNames(., c("response", "count"))
return(survey.data2)
}
my_survey %>%
multiple_choice_tally(question = question.2)
# A tibble: 3 x 2
response count
<chr> <int>
1 Bus 3
2 Cycle 3
3 Walk 2
Upvotes: 3
Reputation: 818
My current solution for this problem is as follows:
multiple_choice_tally <- function(survey.data, question) {
## Require a sym for the RHS of !!response := if_else
question_as_quo <- enquo(question)
question_as_string <- quo_name(question_as_quo)
target_question <- rlang::sym(question_as_string)
## Collate unique responses to the question
unique_responses <- survey.data %>%
select(!!target_question) %>%
na.omit() %>%
.[[1]] %>%
strsplit(",") %>%
unlist() %>%
trimws() %>%
unique()
## Extract responses to question
question_tally <- survey.data %>%
select(!!target_question) %>%
na.omit()
## Iteratively create a column for each unique response
invisible(lapply(unique_responses,
function(response) {
question_tally <<- question_tally %>%
mutate(!!response := if_else(str_detect(!!target_question, response), TRUE, FALSE))
}))
## Gather into tidy form
question_tally %>%
summarise_if(is.logical, funs(sum)) %>%
gather(response, value = count)
}
Which can then be used as follows:
library("tidyverse")
library("rlang")
library("stringr")
my_survey <- tibble(
id = 1:5,
question.1 = 1:5,
question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)
my_survey %>%
multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
response count
<chr> <int>
1 Bus 3
2 Walk 2
3 Cycle 3
Upvotes: 0