user2494353
user2494353

Reputation: 35

Using a loop in R to calculate the frequency of a specific outcome for multiple columns

I have a dataset where I need to calculate the % of respondents who answered "yes" to a series of questions, ignoring the respondents who didn't answer the question. Each question is made up of two subset questions - whether the respondent did the thing in the past month, and whether they did it in the past year.

The dataset looks something like this. In this example respondents are indicating whether they ate specific types of dessert in the past year and the past month:

df <- read.table(text = "1 yes yes yes yes yes yes
                 2 no yes unknown unknown no no
                 3 no no no yes no yes
                 4 unknown unknown no yes yes yes",
                 header = F)

colnames(df) <- c("id", "ice cream - past month", "ice cream - past year",
                  "chocolate - past month", "chocolate - past year",
                  "cookies - past month", "cookies - past year")

I'd like to create a table summarizing the % of respondents who said yes (ignoring the "unknowns") for each question, generating a table that looks like this.

Dessert Past month Past year
Ice Cream 33.3% 66.6%
Chocolate 33.3% 100.0%
Cookies 50.0% 75.0%

the work-around I've created is creating a small table of the % of respondents who answered yes for each variable.

An example of that code is as follows:

icecream.month <- df %>%
  filter(`ice cream - past month` == "yes" | `ice cream - past month` == "no") %>%
  tabyl(`ice cream - past month`) %>%
  adorn_totals("row") %>%
  adorn_pct_formatting(rounding = "half up", digits = 1) %>%
  select(-c("n")) %>%
  rename("past month" = "percent") %>%
  rename("ice cream" = "ice cream - past month") %>%
  subset(`ice cream` == "yes")

I would essentially repeat this code for each dessert and each time period. Then used "bind_rows" and "full_join" arguments to bring all the values into one table.

As you can see, this is pretty clunky. I'm very new to loops, but I suspect there is a way to do this pretty simply with a loop. Any advice would be truly appreciated!

Thanks

Upvotes: 1

Views: 194

Answers (1)

user438383
user438383

Reputation: 6227

No need for loops - is this what you're after?

df %>% 
    tidyr::pivot_longer(-id) %>% 
    tidyr::separate(name, sep=" - ", into = c("food", "time")) %>% 
    dplyr::group_by(food, time) %>% 
    dplyr::filter(value != "unknown") %>% 
    dplyr::summarise(a = sum(value == "yes") / n() * 100) %>% 
    tidyr::pivot_wider(names_from="time", values_from="a")
# A tibble: 3 x 3
# Groups:   food [3]
  food      `past month` `past year`
  <chr>            <dbl>       <dbl>
1 chocolate         33.3       100  
2 cookies           50          75  
3 ice cream         33.3        66.7

Upvotes: 3

Related Questions