Doug Fir
Doug Fir

Reputation: 21212

dplyr summarise and then summarise_at in the same pipe

This question has come up before and there are some solutions but none that I could find for this specific case. e.g.

my_diamonds <- diamonds %>% 
  mutate(blah_var1 = rnorm(n()),
         blah_var2 = rnorm(n()),
         blah_var3 = rnorm(n()),
         blah_var4 = rnorm(n()),
         blah_var5 = rnorm(n()))

my_diamonds %>% 
  group_by(cut) %>% 
  summarise(MaxClarity = max(clarity),
            MinTable = min(table), .groups = 'drop') %>% 
  summarise_at(vars(contains('blah')), mean)

Want a new df showing the max clarity, min table and mean of each of the blah variables. The above returned an empty tibble. Based on some other SO posts I tried using mutate and then summarise at:

my_diamonds %>% 
  group_by(cut) %>% 
  mutate(MaxClarity = max(clarity),
            MinTable = min(table)) %>% 
  summarise_at(vars(contains('blah')), mean)

This returns a tibble but only for the blah variables, MaxClarity and MinTable are missing.

Is there a way to combine summarise and summarise_at in the same dplyr chain?

Upvotes: 0

Views: 161

Answers (1)

akrun
akrun

Reputation: 887118

One issue with the summarise is that after the first call of summarise, we get only the columns in the grouping i.e. the 'cut' along with and the summarised columns i.e. 'MaxClarity' and 'MinTable'. In addition, after the first summarise step, the grouping is removed with groups = 'drop'

library(dplyr) # version >= 1.0
my_diamonds %>% 
  group_by(cut) %>% 
  summarise(MaxClarity = max(clarity),
            MinTable = min(table),
            across(contains('blah'), mean, na.rm = TRUE), .groups = 'drop')

Upvotes: 1

Related Questions