NICE8xx
NICE8xx

Reputation: 21

group_by and summarize() multiple things in R using dplyr/tidyverse

I am trying to find the country with the highest average age but I also need to filter out countries with less than 5 entries in the data frame. I tried the following but it does not work:

bil %>% 
  group_by(citizenship,age) %>% 
  mutate(n=count(citizenship), theMean=mean(age,na.rm=T)) %>% 
  filter(n>=5) %>% 
  arrange(desc(theMean))

bil is the dataset and I am trying to count how many entries I have for each country, filter out countries with less than 5 entries, find the average age for each country and then find the country with the highest average. I am confused on how to do both things at the same time. If I do one summarize at a time I lose the rest of my data.

Upvotes: 0

Views: 339

Answers (1)

akrun
akrun

Reputation: 887831

Perhaps, this could help. Note that the parameter 'x' in count is a tbl/data.frame. So, instead of count, we group by 'citizenship' and get the frequency of values with n(), get the mean of 'age' (not sure about the 'age' as grouping variable) and do the filter

bil %>%
   group_by(citizenship) %>% 
   mutate(n = n()) %>%     
   mutate(theMean = mean(age, na.rm=TRUE)) %>% 
   filter(n>=5) %>%
   arrange(desc(theMean))

Upvotes: 2

Related Questions