Reputation: 183
Let's suppose I have the following data.frame:
df = data.frame(x = c(1,3,5,6,8,11,15,16,18,20,21,22,24,25,40,50,54,55,70,71,100,101,102,103))
I need to find the lag between non-consecutive x values. This means authomatically doing: 3-1, 5-3, 8-6, 11-8, 15-11, 18-16, 20-18, 24-22, etc...
Any suggestion?
Upvotes: 0
Views: 128
Reputation: 887213
Using head
, tail
to get the difference.
with(df, {i1 <- tail(x, -1) - head(x, -1); i1[i1 != 1]})
#[1] 2 2 2 3 4 2 2 2 15 10 4 15 29
NOTE: Not redoing the calculation twice
Upvotes: 1
Reputation: 1123
Do the subtraction like this:
df_lag <- df[2:nrow(df),] - df[1:(nrow(df)-1),]
That will give you the lag you want.
df_lag[df_lag>1]
[1] 2 2 2 3 4 2 2 2 15 10 4 15 29
Upvotes: 1
Reputation: 13319
Adapting @Ronak Shah's approach,we could achieve this with dplyr
as follows:
library(dplyr)
df %>%
transmute(x=lead(x,1)-x) %>%
filter(x!=1)
x
1 2
2 2
3 2
4 3
5 4
6 2
7 2
8 2
9 15
10 10
11 4
12 15
13 29
Upvotes: 2
Reputation: 11255
Here's another answer which appends the difference to the df as well as the starting x value:
df = data.frame(x = c(1,3,5,6,8,11,15,16,18,20,21,22,24,25,40,50,54,55,70,71,100,101,102,103))
df$x0 <- c(NA_integer_, df[1:(nrow(df)-1), 'x'])
df$difference = c(0, diff(df$x))
df[df$difference > 1, ]
# x x0 difference
2 3 1 2
3 5 3 2
5 8 6 2
6 11 8 3
7 15 11 4
9 18 16 2
10 20 18 2
13 24 22 2
15 40 25 15
16 50 40 10
17 54 50 4
19 70 55 15
21 100 71 29
Upvotes: 1
Reputation: 51592
You basically need to construct your grouping variable and take it from there, i.e.
c(FALSE, cumsum(diff(df$x) == 1))
#[1] 0 0 0 1 1 1 1 2 2 2 3 4 4 5 5 5 5 6 6 7 7 8 9 10
You can then use that to do whatever calculations you want, for example the difference would be,
tapply(df$x, df$new, diff)
#$`0`
#[1] 2 2
#$`1`
#[1] 2 3 4
#$`2`
#[1] 2 2
#$`3`
#numeric(0)
#$`4`
#[1] 2
#$`5`
#[1] 15 10 4
#$`6`
#[1] 15
#$`7`
#[1] 29
#$`8`
#numeric(0)
#$`9`
#numeric(0)
#$`10`
#numeric(0)
Upvotes: 1
Reputation: 389012
We can take difference between all values but keep only those values which are not consecutive
with(df, diff(x)[diff(x) != 1])
#[1] 2 2 2 3 4 2 2 2 15 10 4 15 29
Upvotes: 4