Reputation: 4836
I have the following data frame.
date id value
2012-01-01 1 0.3
2012-01-01 2 0.5
2012-01-01 3 0.2
2012-01-01 4 0.8
2012-01-01 5 0.2
2012-01-01 6 0.8
2012-01-01 7 0.1
2012-01-01 8 0.4
2012-01-01 9 0.3
2012-01-01 10 0.2
There are several dates and for each date, I have 10 id values as shown above and a value field. What I would like to do is for every id find the previous n values in the "value" field. For example if n = 3 then I want the output to be as follows.
date id value value1 value2 value3
2012-01-01 1 0.3 NA NA NA
2012-01-01 2 0.5 NA NA NA
2012-01-01 3 0.2 NA NA NA
2012-01-01 4 0.8 0.2 0.5 0.3
2012-01-01 5 0.2 0.8 0.2 0.5
...
Is there an easy way to get to this either through plyr or using mapply? Thanks much in advance.
Upvotes: 6
Views: 3642
Reputation: 36
Just want to add to @thelatemail's answer (I couldn't directly comment bc of my reputation):
prevrows2 <- function(data,n) {
if (length(data) >= 10){
sapply(1:n,function(x) c(rep(NA,x),head(data,-x)))
} else {
cbind(sapply(1:length(data),function(x) c(rep(NA,x),head(data,-x))),
matrix(NA,nrow = length(data),ncol= n - length(data)))}
}
this addition protects against the case when the number of rows in a group is less than the number of rows you wish to select (n)
Upvotes: 0
Reputation: 2826
Using data.table
v1.9.5+ this is as simple as:
library(data.table)
setDT(dt)
lags <- dt[, shift(value, n = c(1,2,3))]
or to append them as additional columns in the same data.table:
dt[, c("lag1", "lag2", "lag3") := shift(value, n = c(1,2,3))]
Upvotes: 3
Reputation: 93813
You can do this quite easily using base functions:
id <- 1:10
value <- c(0.3,0.5,0.2,0.8,0.2,0.8,0.1,0.4,0.3,0.2)
test <- data.frame(id,value)
test$valprev1 <- c(rep(NA,1),head(test$value,-1))
test$valprev2 <- c(rep(NA,2),head(test$value,-2))
test$valprev3 <- c(rep(NA,3),head(test$value,-3))
Result
id value valprev1 valprev2 valprev3
1 1 0.3 NA NA NA
2 2 0.5 0.3 NA NA
3 3 0.2 0.5 0.3 NA
4 4 0.8 0.2 0.5 0.3
5 5 0.2 0.8 0.2 0.5
6 6 0.8 0.2 0.8 0.2
7 7 0.1 0.8 0.2 0.8
8 8 0.4 0.1 0.8 0.2
9 9 0.3 0.4 0.1 0.8
10 10 0.2 0.3 0.4 0.1
Made a mistake here previously - here is an sapply
version in a function:
prevrows <- function(data,n) {sapply(1:n,function(x) c(rep(NA,x),head(data,-x)))}
prevrows(test$value,3)
Which gives just this:
[,1] [,2] [,3]
[1,] NA NA NA
[2,] 0.3 NA NA
[3,] 0.5 0.3 NA
[4,] 0.2 0.5 0.3
[5,] 0.8 0.2 0.5
[6,] 0.2 0.8 0.2
[7,] 0.8 0.2 0.8
[8,] 0.1 0.8 0.2
[9,] 0.4 0.1 0.8
[10,] 0.3 0.4 0.1
You could then apply this to each set of dates in your data like this:
result <- tapply(test$value,test$date,prevrows,3)
Which gives a bunch of lists for each date set. You could rowbind these up for adding back to your data set with:
data.frame(test,do.call(rbind,result))
Upvotes: 6