user1172468
user1172468

Reputation: 5474

How do I elegantly calculate a variable in an R data.frame that uses values in a previous row?

Here is a simple scenario I constructed:

Say I have the following:

set.seed(1)
id<-sample(3,10,replace = TRUE)
n<-1:10
x<-round(runif(10,30,40))
df<-data.frame(id,n,x)
df
   id  n  x
1   1  1 32
2   2  2 32
3   2  3 37
4   3  4 34
5   1  5 38
6   3  6 35
7   3  7 37
8   2  8 40
9   2  9 34
10  1 10 38

How do I elegantly calculate x.lag where x.lag is a previous x for the same id or 0 if a previous value does not exist.

This is what I did but I'm not happy with it:

df$x.lag<-rep(0,10)
for (id in 1:3)
 df[df$id==id,]$x.lag<-c(0,df[df$id==id,]$x)[1:sum(df$id==id)]
df
   id  n  x x.lag
1   1  1 32     0
2   2  2 32     0
3   2  3 37    32
4   3  4 34     0
5   1  5 38    32
6   3  6 35    34
7   3  7 37    35
8   2  8 40    37
9   2  9 34    40
10  1 10 38    38

Upvotes: 2

Views: 43

Answers (1)

akrun
akrun

Reputation: 887981

We can use data.table

library(data.table)
setDT(df)[, x.lag :=  shift(x, fill=0), id]

Or with dplyr

library(dplyr)
df %>%
  group_by(id) %>%
  mutate(x.lag = lag(x, default = 0))

Or using ave from base R

df$x.lag <- with(df, ave(x, id, FUN = function(x) c(0, x[-length(x)])))
df$x.lag
#[1]  0  0 32  0 32 34 35 37 40 38

Upvotes: 5

Related Questions