Reputation: 8049
I want to create a new column in a data.table calculated from the current value of one column and the previous of another. Is it possible to access previous rows?
E.g.:
> DT <- data.table(A=1:5, B=1:5*10, C=1:5*100)
> DT
A B C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
> DT[, D := C + BPreviousRow] # What is the correct code here?
The correct answer should be
> DT
A B C D
1: 1 10 100 NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540
Upvotes: 99
Views: 119494
Reputation: 195
it can be done in a loop.
# Create the column D
DT$D <- 0
# for every row in DT
for (i in 1:length(DT$A)) {
if(i==1) {
#using NA at first line
DT[i,4] <- NA
} else {
#D = C + BPreviousRow
DT[i,4] <- DT[i,3] + DT[(i-1), 2]
}
}
Using a for, you can even use the previous value of the row of this new column DT[(i-1), 4]
Upvotes: -2
Reputation: 31
Here is my intuitive solution:
#create data frame
df <- data.frame(A=1:5, B=seq(10,50,10), C=seq(100,500, 100))`
#subtract the shift from num rows
shift <- 1 #in this case the shift is 1
invshift <- nrow(df) - shift
#Now create the new column
df$D <- c(NA, head(df$B, invshift)+tail(df$C, invshift))`
Here invshift
, the number of rows minus 1, is 4. nrow(df)
provides you with the number of rows in a data frame or in a vector. Similarly, if you want to take still earlier values, subtract from nrow 2, 3, ...etc, and also put NA's accordingly at the beginning.
Upvotes: 2
Reputation: 118779
With shift()
implemented in v1.9.6, this is quite straightforward.
DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]
From NEWS:
- New function
shift()
implements fastlead/lag
of vector, list, data.frames or data.tables. It takes atype
argument which can be either "lag" (default) or "lead". It enables very convenient usage along with:=
orset()
. For example:DT[, (cols) := shift(.SD, 1L), by=id]
. Please have a look at?shift
for more info.
See history for previous answers.
Upvotes: 116
Reputation: 21621
Using dplyr
you could do:
mutate(DT, D = lag(B) + C)
Which gives:
# A B C D
#1: 1 10 100 NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540
Upvotes: 57
Reputation: 9805
Several folks have answered the specific question. See the code below for a general purpose function that I use in situations like this that may be helpful. Rather than just getting the prior row, you can go as many rows in the "past" or "future" as you'd like.
rowShift <- function(x, shiftLen = 1L) {
r <- (1L + shiftLen):(length(x) + shiftLen)
r[r<1] <- NA
return(x[r])
}
# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]
# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
eruptLengthTwoPrior=rowShift(eruptions,-2),
eruptLengthThreeFuture=rowShift(eruptions,3))]
## eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1: 3.600 NA 2.283
##2: 1.800 NA 4.533
##3: 3.333 3.600 NA
##4: 2.283 1.800 NA
##5: 4.533 3.333 NA
Upvotes: 24
Reputation: 3627
Based on @Steve Lianoglou 's comment above, why not just:
DT[, D:= C + c(NA, B[.I - 1]) ]
# A B C D
# 1: 1 10 100 NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540
And avoid using seq_len
or head
or any other function.
Upvotes: 13
Reputation: 5617
Following Arun's solution, a similar results can be obtained without referring to to .N
> DT[, D := C + c(NA, head(B, -1))][]
A B C D
1: 1 10 100 NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540
Upvotes: 9