aaaaa
aaaaa

Reputation: 183

Find lag between non-consecutive observations

Let's suppose I have the following data.frame:

df = data.frame(x = c(1,3,5,6,8,11,15,16,18,20,21,22,24,25,40,50,54,55,70,71,100,101,102,103))

I need to find the lag between non-consecutive x values. This means authomatically doing: 3-1, 5-3, 8-6, 11-8, 15-11, 18-16, 20-18, 24-22, etc...

Any suggestion?

Upvotes: 0

Views: 128

Answers (6)

akrun
akrun

Reputation: 887213

Using head, tail to get the difference.

with(df, {i1 <- tail(x, -1) - head(x, -1); i1[i1 != 1]})
#[1]  2  2  2  3  4  2  2  2 15 10  4 15 29

NOTE: Not redoing the calculation twice

Upvotes: 1

Santiago I. Hurtado
Santiago I. Hurtado

Reputation: 1123

Do the subtraction like this:

 df_lag <- df[2:nrow(df),] - df[1:(nrow(df)-1),]

That will give you the lag you want.

df_lag[df_lag>1]
 [1]  2  2  2  3  4  2  2  2 15 10  4 15 29

Upvotes: 1

NelsonGon
NelsonGon

Reputation: 13319

Adapting @Ronak Shah's approach,we could achieve this with dplyr as follows:

library(dplyr)
  df %>%
  transmute(x=lead(x,1)-x) %>% 
  filter(x!=1)  
        x
    1   2
    2   2
    3   2
    4   3
    5   4
    6   2
    7   2
    8   2
    9  15
    10 10
    11  4
    12 15
    13 29

Upvotes: 2

Cole
Cole

Reputation: 11255

Here's another answer which appends the difference to the df as well as the starting x value:

df = data.frame(x = c(1,3,5,6,8,11,15,16,18,20,21,22,24,25,40,50,54,55,70,71,100,101,102,103))

df$x0 <- c(NA_integer_, df[1:(nrow(df)-1), 'x'])
df$difference = c(0, diff(df$x))

df[df$difference > 1, ]

#    x x0 difference
2    3  1          2
3    5  3          2
5    8  6          2
6   11  8          3
7   15 11          4
9   18 16          2
10  20 18          2
13  24 22          2
15  40 25         15
16  50 40         10
17  54 50          4
19  70 55         15
21 100 71         29

Upvotes: 1

Sotos
Sotos

Reputation: 51592

You basically need to construct your grouping variable and take it from there, i.e.

c(FALSE, cumsum(diff(df$x) == 1))
#[1]  0  0  0  1  1  1  1  2  2  2  3  4  4  5  5  5  5  6  6  7  7  8  9 10

You can then use that to do whatever calculations you want, for example the difference would be,

tapply(df$x, df$new, diff)

#$`0`
#[1] 2 2

#$`1`
#[1] 2 3 4

#$`2`
#[1] 2 2

#$`3`
#numeric(0)

#$`4`
#[1] 2

#$`5`
#[1] 15 10  4

#$`6`
#[1] 15

#$`7`
#[1] 29

#$`8`
#numeric(0)

#$`9`
#numeric(0)

#$`10`
#numeric(0)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389012

We can take difference between all values but keep only those values which are not consecutive

with(df, diff(x)[diff(x) != 1])
#[1]  2  2  2  3  4  2  2  2 15 10  4 15 29

Upvotes: 4

Related Questions