KLenny
KLenny

Reputation: 105

R summarize across with multiple functions

I have a data frame where I am grouping by county, and then trying to summarize teh rest of the data using summarise across. Some of the variables I would like to sum across, while other variables I would like to average across

Here is my sample data:

dat <- data.frame("county" = c("a", "a", "b", "b", "c", "c"), 
                 "pop" = c(10,20,30,40, 40, 20),
                 "men" = c(5, 15, 15, 25, 15, 10),
                 "crime_rate"= c(4,3, 2, 1, 6, 2),
                 "rate_2" = c(1, 2, 1, 4, 3, 10))

here is what I've tried

dat_summary <- dat %>%
  group_by(county) %>%
  summarise(across(c(pop, men), sum)) %>%
  summarise(across(c(crime_rate, rate_2), average))

I know that summarise(across) works if I were to just sum the population or the number of men, and would also work if I just try to find the average of the rates - but how can I get both to work and give me a summary data frame with all the information I need?

The only other way I can think of doing this is to create a data frame grouping and summarizing across for the sum variables, then repeating for the average variables, and then joining all together.

Is there a way for me to do this all in one code sequence? Thanks! *Note: the rates I am working with are n/100,000, so an average will work in this instance.

Upvotes: 3

Views: 4554

Answers (1)

Len Greski
Len Greski

Reputation: 10855

As long as the by group variables remain constant we can include multiple across() functions within a single invocation of summarise() to use different functions to summarize subsets of variables in the input data frame.

dat <- data.frame("county" = c("a", "a", "b", "b", "c", "c"), 
                  "pop" = c(10,20,30,40, 40, 20),
                  "men" = c(5, 15, 15, 25, 15, 10),
                  "crime_rate"= c(4,3, 2, 1, 6, 2),
                  "rate_2" = c(1, 2, 1, 4, 3, 10))
library(dplyr)

dat_summary <- dat %>%
     group_by(county) %>%
     summarise(across(c(pop, men), sum), 
     across(c(crime_rate, rate_2), mean))

...and the output:

> dat %>%
+      group_by(county) %>%
+      summarise(across(c(pop, men), sum), 
+      across(c(crime_rate, rate_2), mean))
# A tibble: 3 × 5
  county   pop   men crime_rate rate_2
  <chr>  <dbl> <dbl>      <dbl>  <dbl>
1 a         30    20        3.5    1.5
2 b         70    40        1.5    2.5
3 c         60    25        4      6.5
> 

Upvotes: 1

Related Questions