Reputation: 2747
I have a R DataFrame df with the following content:
Serial N year current
B 10 14
B 10 16
B 11 10
B 11 NA
B 11 15
C 12 11
C 12 9
C 12 13
C 12 17
. . .
I would like to find the difference between the each consecutive pair of current of the same serial N. This is code I wrote.But I am getting some strange results
library(data.table)
setDT(df)[,mydiff:=diff(df$current),by=Serial N]
print(length(df$current))
I get the following as outuput for that column is quite strange, I get this:
2 6 NA NA NA 2 6 NA NA NA
What I would like to have actually is :
Serial N year current mydiff
B 10 14
B 10 16 16-14=2
B 11 10 10-16=-4
B 11 NA NA
B 11 15 15-10=5
C 12 11
C 12 9 9-11=-2
C 12 -13 -13-9=-22
C 12 17 17-(-13)=30
. . .
Is diff the right thing to do that? if not, how can tackle this (especially without using loops)?
Upvotes: 0
Views: 1066
Reputation: 1553
This may work for you. You can bring values forward with na.locf from the zoo package. The ifelse condition only populates my.diff if current is not NA.
library(data.table)
library(zoo)
df <- read.table(textConnection("
'Serial N' year current
B 10 14
B 10 16
B 11 10
B 11 NA
B 11 15
C 12 11
C 12 9
C 12 -13
C 12 17"),header=TRUE)
setDT(df)
setkey(df,Serial.N)
df[,my.diff := ifelse(!is.na(current), c(" ",diff(na.locf(current))), NA),by=Serial.N]
# Serial.N year current my.diff
# 1: B 10 14
# 2: B 10 16 2
# 3: B 11 10 -6
# 4: B 11 NA NA
# 5: B 11 15 5
# 6: C 12 11
# 7: C 12 9 -2
# 8: C 12 -13 -22
# 9: C 12 17 30
Upvotes: 1