overlapping groups in dplyr

Question

I'm trying to calculate "rolling" summary statistics based on a grouping factor. Is there a nice way to process by (overlapping) groups, based on (say) an ordered factor?

As an example, say I want to calculate the sum of val by groups

df <- data.frame(grp = c("a", "a", "b", "b", "c", "c", "c"),
                 val = rnorm(7))

For groups based on grp, it's easy:

df %>% group_by(grp) %>% summarise(total = sum(val))

# result:
  grp   total
1   a  1.6388
2   b  0.7421
3   c  1.1707

However, what I want to do is calculate "rolling" sums for successive groups ("a" & "b", then "b" & "c", etc.). The desired output would be something like this:

  grp1 grp2   total
1    a    b  1.6388
2    b    c  0.7421

I'm having trouble doing this in dplyr. In particular, I can't seem to figure out how to get "overlapping" groups - the "b" rows in the above example should end up in two output groups.

G. Grothendieck · Accepted Answer

Try lag:

df %>% 
   group_by(grp) %>% 
   arrange(grp) %>%
   summarise(total = sum(val)) %>% 
   mutate(grp1 = lag(grp), grp2 = grp, total = total + lag(total)) %>%
   select(grp1, grp2, total) %>%
   na.omit

overlapping groups in dplyr

Answers (1)

Related Questions