user3841581
user3841581

Reputation: 2747

Using diff() in R with NA and negative numbers

I have a R DataFrame df with the following content:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            NA
   B              11            15
   C              12            11
   C              12             9
   C              12            13
   C              12            17
   .              .              .

I would like to find the difference between the each consecutive pair of current of the same serial N. This is code I wrote.But I am getting some strange results

library(data.table)
setDT(df)[,mydiff:=diff(df$current),by=Serial N]   
print(length(df$current))

I get the following as outuput for that column is quite strange, I get this:

2 6  NA NA NA 2 6  NA NA NA 

What I would like to have actually is :

Serial N         year         current      mydiff
   B              10            14         
   B              10            16         16-14=2
   B              11            10         10-16=-4
   B              11            NA            NA
   B              11            15         15-10=5
   C              12            11
   C              12             9         9-11=-2    
   C              12           -13        -13-9=-22
   C              12            17         17-(-13)=30
   .              .              .

Is diff the right thing to do that? if not, how can tackle this (especially without using loops)?

Upvotes: 0

Views: 1066

Answers (1)

Wyldsoul
Wyldsoul

Reputation: 1553

This may work for you. You can bring values forward with na.locf from the zoo package. The ifelse condition only populates my.diff if current is not NA.

library(data.table)
library(zoo)
df <- read.table(textConnection("
                         'Serial N'         year         current
                            B              10            14
                            B              10            16
                            B              11            10
                            B              11            NA
                            B              11            15
                            C              12            11
                            C              12             9
                            C              12            -13
                            C              12            17"),header=TRUE)

setDT(df)
setkey(df,Serial.N)
df[,my.diff := ifelse(!is.na(current), c(" ",diff(na.locf(current))), NA),by=Serial.N]  


#        Serial.N year current my.diff
# 1:        B   10      14        
# 2:        B   10      16       2
# 3:        B   11      10      -6
# 4:        B   11      NA      NA
# 5:        B   11      15       5
# 6:        C   12      11        
# 7:        C   12       9      -2
# 8:        C   12     -13     -22
# 9:        C   12      17      30

Upvotes: 1

Related Questions