dplyr: group_by, sum various columns, and apply a function based on grouped row sums?

Question

I'm trying to use dplyr to summarize a dataframe of bird species abundance in forests which are fragmented to some degree.

The first column, percent_cover, has 4 possible values: 10, 25, 50, 75. Then there are ten columns of bird species counts: 'species1' through 'species10'.

I want to group by percent_cover, then sum the other columns and calculate these sums as a percentage of the 4 row sums.

To get to the column sums is easy enough:

%>% group_by(Percent_cover) %>% summarise_at(vars(contains("species")), sum)

...but what I need is sum/rowSum*100. It seems that some kind of 'rowwise' operation is needed.

Also, out of interest, why does the following not work?

%>% group_by(Percent_cover) %>% summarise_at(vars(contains("species")), sum*100)

At this point, it's tempting to go back to 'for' loops....or Excel pivot tables.

Ronak Shah · Accepted Answer

To use dplyr, try the following :

library(dplyr)

df %>% 
  group_by(Percent_cover) %>% 
  summarise(across(contains("species"), sum)) %>%
  mutate(rs = rowSums(select(., contains("species")))) %>%
  mutate(across(contains('species'), ~./rs * 100)) -> result

result

For example, using mtcars :

mtcars %>%
  group_by(cyl) %>%
  summarise(across(disp:wt, sum)) %>%
  mutate(rs = rowSums(select(., disp:wt))) %>%
  mutate(across(disp:wt, ~./rs * 100))

#   cyl  disp    hp  drat    wt    rs
#       
#1     4  54.2  42.6 2.10  1.18  2135.
#2     6  58.7  39.2 1.15  0.998 2186.
#3     8  62.0  36.7 0.567 0.702 7974.

dplyr: group_by, sum various columns, and apply a function based on grouped row sums?

Answers (1)

Related Questions