speendo
speendo

Reputation: 13345

Use `dplyr` to divide rows by group

On my attempt to learn dplyr, I want to divide each row by another row, representing the corresponding group's total.

I generated test data with

library(dplyr)

# building test data
data("OrchardSprays")

totals <- OrchardSprays %>% group_by(treatment) %>%
  summarise(decrease = sum(decrease))
totals$decrease <- totals$decrease + seq(10, 80, 10)
totals$rowpos = totals$colpos <- "total"

df <- rbind(OrchardSprays, totals)

Note the line totals$decrease <- totals$decrease + seq(10, 80, 10): for the sake of the question, I assumed there was an additional decrease for each treatment, which was not observed in the single lines of the data frame but only in the "total" lines for each group.

What I now want to do is adding another column decrease_share to the data frame where each line's decrease value is divided by the corresponding treatment groups total decrease value.

So, for head(df) I would expect an output like this

> head(df)
  decrease rowpos colpos treatment treatment_decrease
1       57      1      1         D           0.178125
2       95      2      1         E          0.1711712
3        8      3      1         B         0.09876543
4       69      4      1         H         0.08603491
5       92      5      1         G          0.1488673
6       90      6      1         F          0.1470588

My real world example is a bit more complex (more group variables and also more levels), therefore I am looking for a suitable solution in dplyr.

Upvotes: 0

Views: 152

Answers (1)

Ian Campbell
Ian Campbell

Reputation: 24878

Here's a total dplyr approach:

library(dplyr) #version >= 1.0.0
OrchardSprays %>% 
  group_by(treatment) %>%
  summarise(decrease = sum(decrease)) %>%
  mutate(decrease = decrease + seq(10, 80, 10),
         rowpos = "total",
         colpos = "total") %>% 
  bind_rows(mutate(OrchardSprays, across(rowpos:colpos, as.character))) %>%
  group_by(treatment) %>%
  mutate(treatment_decrease = decrease / decrease[rowpos == "total"])
# A tibble: 72 x 5
# Groups:   treatment [8]
   treatment decrease rowpos colpos treatment_decrease
   <fct>        <dbl> <chr>  <chr>               <dbl>
 1 A               47 total  total               1    
 2 B               81 total  total               1    
 3 C              232 total  total               1    
 4 D              320 total  total               1    
 5 E              555 total  total               1    
 6 F              612 total  total               1    
 7 G              618 total  total               1    
 8 H              802 total  total               1    
 9 D               57 1      1                   0.178
10 E               95 2      1                   0.171
# … with 62 more rows

Upvotes: 1

Related Questions