call variable that has been grouped by

Question

Some sample data:

 df <- data.frame(lang = rep(c("A", "B", "C"), 3), 
                  answer = rep(c("1", "2", "3"), each=3))

I am getting an error when I try to call a variable that I recently grouped by:

 df2 <- df %>%
   Total = count(lang) %>%  # count is short hand for tally + group_by()
   filter(answer=='2') %>% 
   mutate(prop = NROW(answer)/NROW(Total)) 

 Error in group_vars(x) : object 'lang' not found

I would like a new column on my dataframe that says the proportion of the answer '2' to total observations in each level of lang. So how many times does '2' occur in 'A' in proportion to the total number of observations in 'A'?

GenesRus · Accepted Answer

Here's a solution that does what you want:

df %>% 
  group_by(lang) %>% 
  summarize(
    prop = length(lang[answer==2])/n()
  )

Here, we group by the variable or variables that you want set as the unique groups you want to get the proportion of and then use summarize to calculate the length of the vector of one of the variables where answer is equal to 2 and divide that by the number of rows in the grouping. If, for whatever reason, you want the prop column AND the answer column, just change summarize to mutate.

The reason you were getting the error about not finding lang is because count needs to be used as a function like mutate, i.e.

df %>% 
  count(lang, name = "Total")

You could achieve the same thing adapting your code, but you should use add_count (so your answer column is preserved) or mutate(Total = n()). However, group_by was designed to address problems such as this and is definitely worth spending some time to learn about.

df %>% 
  add_count(lang, name = "Total") %>% 
  filter(answer == 2) %>% 
  add_count(lang, name = "Twos") %>% 
  distinct(lang, .keep_all = TRUE) %>% 
  mutate(prop = Twos/Total) %>% 
  select(lang, prop)

call variable that has been grouped by

Answers (2)

Related Questions