Reputation: 35
I have a dataset where I need to calculate the % of respondents who answered "yes" to a series of questions, ignoring the respondents who didn't answer the question. Each question is made up of two subset questions - whether the respondent did the thing in the past month, and whether they did it in the past year.
The dataset looks something like this. In this example respondents are indicating whether they ate specific types of dessert in the past year and the past month:
df <- read.table(text = "1 yes yes yes yes yes yes
2 no yes unknown unknown no no
3 no no no yes no yes
4 unknown unknown no yes yes yes",
header = F)
colnames(df) <- c("id", "ice cream - past month", "ice cream - past year",
"chocolate - past month", "chocolate - past year",
"cookies - past month", "cookies - past year")
I'd like to create a table summarizing the % of respondents who said yes (ignoring the "unknowns") for each question, generating a table that looks like this.
Dessert | Past month | Past year |
---|---|---|
Ice Cream | 33.3% | 66.6% |
Chocolate | 33.3% | 100.0% |
Cookies | 50.0% | 75.0% |
the work-around I've created is creating a small table of the % of respondents who answered yes for each variable.
An example of that code is as follows:
icecream.month <- df %>%
filter(`ice cream - past month` == "yes" | `ice cream - past month` == "no") %>%
tabyl(`ice cream - past month`) %>%
adorn_totals("row") %>%
adorn_pct_formatting(rounding = "half up", digits = 1) %>%
select(-c("n")) %>%
rename("past month" = "percent") %>%
rename("ice cream" = "ice cream - past month") %>%
subset(`ice cream` == "yes")
I would essentially repeat this code for each dessert and each time period. Then used "bind_rows" and "full_join" arguments to bring all the values into one table.
As you can see, this is pretty clunky. I'm very new to loops, but I suspect there is a way to do this pretty simply with a loop. Any advice would be truly appreciated!
Thanks
Upvotes: 1
Views: 194
Reputation: 6227
No need for loops - is this what you're after?
df %>%
tidyr::pivot_longer(-id) %>%
tidyr::separate(name, sep=" - ", into = c("food", "time")) %>%
dplyr::group_by(food, time) %>%
dplyr::filter(value != "unknown") %>%
dplyr::summarise(a = sum(value == "yes") / n() * 100) %>%
tidyr::pivot_wider(names_from="time", values_from="a")
# A tibble: 3 x 3
# Groups: food [3]
food `past month` `past year`
<chr> <dbl> <dbl>
1 chocolate 33.3 100
2 cookies 50 75
3 ice cream 33.3 66.7
Upvotes: 3