dplyr sum by group by run ID

Question

My data

is.over.sma ID.No   total.shares.months total.cost.months   total.shares.years  total.cost.years    RunID
1   79  0   0   0   0   7
1   80  0.79906924  833.3333333 0   0   7
1   81  0   0   0   0   7
1   82  0   0   0   0   7
1   83  0   0   0   0   7
1   84  0   0   0   0   7
1   85  0   0   0   0   7
1   86  0   0   0   0   7
1   87  0   0   0   0   7
1   88  0.56    700 0   0   7
1   89  0   0   0   0   7
1   90  0   0   0   0   7
1   91  0   0   0   0   7
1   92  0   0   0   0   7
1   93  0   0   0   0   7
1   94  0.78    900 0   0   8
1   95  0   0   0   0   8
1   96  0   0   0   0   8
1   97  0   0   0   0   8
1   98  0   0   0   0   8
1   99  0   0   0   0   8
1   100 0.751522595 833.3333333 0   0   8
1   101 0   0   0   0   8
1   102 0   0   0   0   8

The intention is to group by RunID. Then within each group sum total.shares.months columns and total.cost.months.

This is what I am trying:

# Dplyr to group by over.sma
output.sma <- df %>%
  dplyr::mutate(RunID = ifelse(is.over.sma == 1,data.table::rleid(is.over.sma),0)) %>%
  group_by(RunID) %>%
  mutate(ID.No = ifelse(is.over.sma == 1,row_number(),0)) %>%
  dplyr::mutate(sum.shares.over.sma = ifelse(is.over.sma ==1,sum(total.shares.months),0)) %>%  # Divide total purchased by cost price for total share
  dplyr::mutate(sum.cost.over.sma = ifelse(is.over.sma ==1,sum(total.cost.months),0))
  ungroup() %>%
  select(-RunID)

Desired output for RunID 7 the sums should =

sum.shares.over.sma = 1.359
sum.cost.over.sma = 1533.33

For RunID 8:

sum.shares.over.sma = 1.531
sum.cost.over.sma = 1733.33

jazzurro · Accepted Answer

Seeing your expected outcome, I think you could do the following. I called your data set mydf. You group the data by RunID. Then, you want to apply sum() to total.shares.months and total.cost.months, which you can do in summarise_at().

group_by(mydf, RunID) %>%
summarise_at(vars(total.shares.months:total.cost.months),
             funs(sum(., na.rm = TRUE))
            )

  RunID total.shares.months total.cost.months
                              
1     7                1.36              1533
2     8                1.53              1733

dplyr sum by group by run ID

Answers (2)

Related Questions