Andrew Bannerman
Andrew Bannerman

Reputation: 1305

dplyr sum by group by run ID

My data

is.over.sma ID.No   total.shares.months total.cost.months   total.shares.years  total.cost.years    RunID
1   79  0   0   0   0   7
1   80  0.79906924  833.3333333 0   0   7
1   81  0   0   0   0   7
1   82  0   0   0   0   7
1   83  0   0   0   0   7
1   84  0   0   0   0   7
1   85  0   0   0   0   7
1   86  0   0   0   0   7
1   87  0   0   0   0   7
1   88  0.56    700 0   0   7
1   89  0   0   0   0   7
1   90  0   0   0   0   7
1   91  0   0   0   0   7
1   92  0   0   0   0   7
1   93  0   0   0   0   7
1   94  0.78    900 0   0   8
1   95  0   0   0   0   8
1   96  0   0   0   0   8
1   97  0   0   0   0   8
1   98  0   0   0   0   8
1   99  0   0   0   0   8
1   100 0.751522595 833.3333333 0   0   8
1   101 0   0   0   0   8
1   102 0   0   0   0   8

The intention is to group by RunID. Then within each group sum total.shares.months columns and total.cost.months.

This is what I am trying:

# Dplyr to group by over.sma
output.sma <- df %>%
  dplyr::mutate(RunID = ifelse(is.over.sma == 1,data.table::rleid(is.over.sma),0)) %>%
  group_by(RunID) %>%
  mutate(ID.No = ifelse(is.over.sma == 1,row_number(),0)) %>%
  dplyr::mutate(sum.shares.over.sma = ifelse(is.over.sma ==1,sum(total.shares.months),0)) %>%  # Divide total purchased by cost price for total share
  dplyr::mutate(sum.cost.over.sma = ifelse(is.over.sma ==1,sum(total.cost.months),0))
  ungroup() %>%
  select(-RunID)

Desired output for RunID 7 the sums should =

sum.shares.over.sma = 1.359
sum.cost.over.sma = 1533.33

For RunID 8:

sum.shares.over.sma = 1.531
sum.cost.over.sma = 1733.33

Upvotes: 1

Views: 1499

Answers (2)

vaettchen
vaettchen

Reputation: 7659

There is the accepted answer already and it's a perfect solution - just to show the aggregate version:

x <- merge( aggregate(total.shares.months + total.shares.years ~ RunID, data = mydf, sum ), 
            aggregate(total.cost.months + total.cost.years ~ RunID, data = mydf, sum ))
colnames( x )[2:3] <- c( "sum.shares.over.sma", "sum.cost.over.sma" )
x
  RunID sum.shares.over.sma sum.cost.over.sma
1     7            1.359069          1533.333
2     8            1.531523          1733.333

Upvotes: 2

jazzurro
jazzurro

Reputation: 23574

Seeing your expected outcome, I think you could do the following. I called your data set mydf. You group the data by RunID. Then, you want to apply sum() to total.shares.months and total.cost.months, which you can do in summarise_at().

group_by(mydf, RunID) %>%
summarise_at(vars(total.shares.months:total.cost.months),
             funs(sum(., na.rm = TRUE))
            )

  RunID total.shares.months total.cost.months
  <int>               <dbl>             <dbl>
1     7                1.36              1533
2     8                1.53              1733

Upvotes: 3

Related Questions