user33484
user33484

Reputation: 568

Cumulative sum on preceding rows in the same column - R

data:

    test <- structure(list(fgu_clms = c(14621389.763697, 145818119.352026, 
21565415.2337476, 20120830.8221406, 12999772.0950838), loss_to_layer = c(0, 
125818119.352026, 1565415.23374765, 120830.822140567, 0)), row.names = c(NA, 
5L), class = "data.frame")

    > test
   fgu_clms loss_to_layer
1  14621390           0.0
2 145818119   125818119.4
3  21565415     1565415.2
4  20120831      120830.8
5  12999772           0.0

I want to create a new column which tries to use a cumulative sum on the rows above it. It's easier if I show how the calculation on the new column works row by row:

row 1: first calculate the sum the value of rows above in the same column. As this is row 1 there are no rows above this value is 0, call this cumsum_1. It should then take the minimum of the value of row 1 in column "loss_to_layer" and the calculation "x2 - cumsum_1".

In row 2: calculate the cumsum by looking at the value above, i.e. min(x2-cumsum_1,loss_to_layer value). Call this cumsum_2. Then repeat as above, i.e. be subject to the minimum of the value on row 2 of the loss-to_layer column and x2 - cumsum_2.

And so on.

In excel, this would be done by using MIN(B2,x2 - SUM(C$1:C1)) and dragging this formula down.

The results with x2 = 127,000,000 should be:

   fgu_clms loss_to_layer   new_col
1  14621390           0.0         0
2 145818119   125818119.4 125818119
3  21565415     1565415.2   1181881
4  20120831      120830.8         0
5  12999772           0.0         0

As you can see the sum of the "new_col" always sums back up to "x2", in this case 127,000,000.

I have tried:

test <- test %>% mutate(new_col = pmin(loss_to_layer,127e6-cumsum(lag(new_col,1,default=0))))

But get an error as it cannot find the column new_col in the lag function

Upvotes: 0

Views: 1205

Answers (1)

det
det

Reputation: 5232

test %>% 
  mutate(
    cumsum_1 = cumsum(lag(loss_to_layer, default = 0)),
    new_col = pmin(loss_to_layer, 127000000 - cumsum_1),
    new_col = ifelse(new_col < 0, 0, new_col)
  ) %>%
    select(-cumsum_1)

Upvotes: 1

Related Questions