Using dplyr for dynamic group_by

Question

Trying to get my head around this dplyr thingy. I have a sorted data frame that I want to group based on a variable. However, the groups need to be constructed so that each of them have a minimum sum of 30 on the grouping variable.

Consider this small example data frame:

df1 <- matrix(data = c(05,0.9,95,12,0.8,31,
    16,0.8,28,17,0.7,10,
        23,0.8,11,55,0.6,9,
    56,0.5,12,57,0.2,1,
    59,0.4,1),
  ncol = 3,
  byrow = TRUE,
  dimnames = list(c(1:9), 
    c('freq', 'mean', 'count')
  )
)

Now, I want to group so that count have a sum of at least 30. freq and mean should then be collapsed into a weighted.mean where the weights is the count values. Note that the last "bin" reaches a sum of 32 by row 7, but since row 8:9 only sums to 2, I add them to the last "bin".

Like so:

freq   mean   count
 5.00  0.90   95
12.00  0.80   31
16.26  0.77   38
45.18  0.61   34

The simple summarizing with dplyr is not a problem, but this I can't figure out. I do think the the solution is hidden somewhere here:

Dynamic Grouping in R | Grouping based on condition on applied function

But how to apply it to my situation escapes me.

Using dplyr for dynamic group_by

Answers (1)

Related Questions