msmarchena
msmarchena

Reputation: 113

How to create a column which use its own lag value using dplyr

Suppose I have the following data frame

c1<- c(1:10)
c2<- c(11:20)
df<- data.frame(c1,c2)

c1  c2
1   11
2   12
3   13
4   14
5   15
6   16
7   17
8   18
9   19
10  20

I would like to add a column c3 which is the sum of c3(-1)+c2-c1. For instance, in the example above the expected result will be:

c1  c2  c3
1   11  0
2   12  10
3   13  20
4   14  30
5   15  40
6   16  50
7   17  60
8   18  70
9   19  80
10  20  90

Is it possible to perform this operation using dplyr ? I have tried several approaches without success. Any suggestion will be much appreciated.

Upvotes: 1

Views: 696

Answers (2)

Andre Elrico
Andre Elrico

Reputation: 11500

This creates column c3. Assuming the first entry is always 0, since there is no preceding element.

df$c3 <- df$c2 - df$c1
df[1,"c3"] <- 0
df$c3 <- cumsum(df$c3)

output

> df
   c1 c2 c3
1   1 11  0
2   2 12 10
3   3 13 20
4   4 14 30
5   5 15 40
6   6 16 50
7   7 17 60
8   8 18 70
9   9 19 80
10 10 20 90
> 

Upvotes: 3

Gregor Thomas
Gregor Thomas

Reputation: 146224

This is a good use for cumsum - cumulative summation.

c3 = lag(cumsum(c2 - c1), default = 0)

Don't think of c3 as c3(-1) + c2 - c1, think of it as c3(n) = sum (from 1 to n - 1) c2(i) - c1(i)

Upvotes: 4

Related Questions