Using a loop in R to calculate the frequency of a specific outcome for multiple columns

Question

I have a dataset where I need to calculate the % of respondents who answered "yes" to a series of questions, ignoring the respondents who didn't answer the question. Each question is made up of two subset questions - whether the respondent did the thing in the past month, and whether they did it in the past year.

The dataset looks something like this. In this example respondents are indicating whether they ate specific types of dessert in the past year and the past month:

df <- read.table(text = "1 yes yes yes yes yes yes
                 2 no yes unknown unknown no no
                 3 no no no yes no yes
                 4 unknown unknown no yes yes yes",
                 header = F)

colnames(df) <- c("id", "ice cream - past month", "ice cream - past year",
                  "chocolate - past month", "chocolate - past year",
                  "cookies - past month", "cookies - past year")

I'd like to create a table summarizing the % of respondents who said yes (ignoring the "unknowns") for each question, generating a table that looks like this.

Dessert	Past month	Past year
Ice Cream	33.3%	66.6%
Chocolate	33.3%	100.0%
Cookies	50.0%	75.0%

the work-around I've created is creating a small table of the % of respondents who answered yes for each variable.

An example of that code is as follows:

icecream.month <- df %>%
  filter(`ice cream - past month` == "yes" | `ice cream - past month` == "no") %>%
  tabyl(`ice cream - past month`) %>%
  adorn_totals("row") %>%
  adorn_pct_formatting(rounding = "half up", digits = 1) %>%
  select(-c("n")) %>%
  rename("past month" = "percent") %>%
  rename("ice cream" = "ice cream - past month") %>%
  subset(`ice cream` == "yes")

I would essentially repeat this code for each dessert and each time period. Then used "bind_rows" and "full_join" arguments to bring all the values into one table.

As you can see, this is pretty clunky. I'm very new to loops, but I suspect there is a way to do this pretty simply with a loop. Any advice would be truly appreciated!

Thanks

user438383 · Accepted Answer

No need for loops - is this what you're after?

df %>% 
    tidyr::pivot_longer(-id) %>% 
    tidyr::separate(name, sep=" - ", into = c("food", "time")) %>% 
    dplyr::group_by(food, time) %>% 
    dplyr::filter(value != "unknown") %>% 
    dplyr::summarise(a = sum(value == "yes") / n() * 100) %>% 
    tidyr::pivot_wider(names_from="time", values_from="a")

# A tibble: 3 x 3
# Groups:   food [3]
  food      `past month` `past year`
                     
1 chocolate         33.3       100  
2 cookies           50          75  
3 ice cream         33.3        66.7

Using a loop in R to calculate the frequency of a specific outcome for multiple columns

Answers (1)

Related Questions