Reputation: 189
Let's say I have a data table (or frame) like this:
Time Value
1 10
2 10
3 10
4 15
5 15
6 20
I'd like to attach columns that mark the time and value of the next change in Value. It would look like this:
Time Value T_Next V_Next
1 10 4 15
2 10 4 15
3 10 4 15
4 15 6 20
5 15 6 20
6 20 NA NA
I don't care too much about the last row. The only way I can think to do this is with some nested for loops, which would be slow and I have a large dataset. I'm sure there's a more R-style way to do it which will be more performant. Any help appreciated!
Upvotes: 2
Views: 1792
Reputation: 2644
I came across this old question while searching for how to find first row in data.table where value changes compared to previous row. Since the other data.table answer suggest results that are different from what the question requested, here is another data.table
option:
# create a sample data table
dt = data.table('Time' = c(1,2,3,4,5,6), 'Value' = c(10,10,10,15,15,20))
# Add new columns storing Value and Time if Value is different from preceding row Value
dt[
Value != shift(Value, 1),
`:=`(shifted_time = Time, shifted_value = Value)]
# back-fill missing values with next (non-na) observation caried backward
dt[, `:=`(
shifted_time = nafill(shifted_time, 'nocb'),
shifted_value = nafill(shifted_value, 'nocb'))]
# make sure that shifted_value is not equal value
dt[Value != shift(Value, 1),
`:=`(
shifted_time = shift(Time, -1),
shifted_value = shift(Value, -1))]
Which results in:
dt
Time Value shifted_time shifted_value
1: 1 10 4 15
2: 2 10 4 15
3: 3 10 4 15
4: 4 15 6 20
5: 5 15 6 20
6: 6 20 NA NA
Upvotes: 1
Reputation: 886938
Here is another option with data.table
. Convert the 'data.frame' to 'data.table', grouped by the lag
of rleid
of 'Value', we find the max
of the 'Time' and 'Value' to create two columns 'T_next' and 'V_next'. Then, change the last row to NA.
library(data.table)
setDT(df1)[, paste(c("T", "V"), "next", sep="_") := lapply(.SD, max),
by = .(shift(rleid(Value), fill = 1))][.N, c("T_next", "V_next") := rep(list(NA),2)][]
# Time Value T_next V_next
#1: 1 10 4 4
#2: 2 10 4 4
#3: 3 10 4 4
#4: 4 15 4 4
#5: 5 15 6 6
#6: 6 20 NA NA
Upvotes: 1
Reputation: 9618
What about this?:
cbind(df, df[rep(cumsum(rle(df$Value)$lengths) + 1, rle(df$Value)$lengths),])
Time Value Time Value
4 1 10 4 15
4.1 2 10 4 15
4.2 3 10 4 15
6 4 15 6 20
6.1 5 15 6 20
NA 6 20 NA NA
Upvotes: 4