Reputation: 189

R Find Next Row Where Value Changes

Let's say I have a data table (or frame) like this:

Time    Value
1            10
2            10
3            10
4            15
5            15
6            20

I'd like to attach columns that mark the time and value of the next change in Value. It would look like this:

Time    Value    T_Next    V_Next
1            10         4              15
2            10         4              15 
3            10         4              15 
4            15         6              20
5            15         6              20
6            20         NA          NA

I don't care too much about the last row. The only way I can think to do this is with some nested for loops, which would be slow and I have a large dataset. I'm sure there's a more R-style way to do it which will be more performant. Any help appreciated!

Upvotes: 2

Answers (3)

ira

Reputation: 2644

I came across this old question while searching for how to find first row in data.table where value changes compared to previous row. Since the other data.table answer suggest results that are different from what the question requested, here is another data.table option:

# create a sample data table
dt = data.table('Time' = c(1,2,3,4,5,6), 'Value' = c(10,10,10,15,15,20))

# Add new columns storing Value and Time if Value is different from preceding row Value
dt[
  Value != shift(Value, 1),
  `:=`(shifted_time = Time, shifted_value = Value)]


# back-fill missing values with next (non-na) observation caried backward    
dt[, `:=`(
  shifted_time = nafill(shifted_time, 'nocb'),
  shifted_value = nafill(shifted_value, 'nocb'))]

# make sure that shifted_value is not equal value    
dt[Value != shift(Value, 1),
     `:=`(
       shifted_time = shift(Time, -1),
       shifted_value = shift(Value, -1))]

Which results in:

dt
   Time Value shifted_time shifted_value
1:    1    10            4            15
2:    2    10            4            15
3:    3    10            4            15
4:    4    15            6            20
5:    5    15            6            20
6:    6    20           NA            NA

Upvotes: 1

akrun

Reputation: 887881

Here is another option with data.table. Convert the 'data.frame' to 'data.table', grouped by the lag of rleid of 'Value', we find the max of the 'Time' and 'Value' to create two columns 'T_next' and 'V_next'. Then, change the last row to NA.

library(data.table)
setDT(df1)[,  paste(c("T", "V"), "next", sep="_") := lapply(.SD, max), 
   by = .(shift(rleid(Value), fill = 1))][.N, c("T_next", "V_next") := rep(list(NA),2)][]
#   Time Value T_next V_next
#1:    1    10      4      4
#2:    2    10      4      4
#3:    3    10      4      4
#4:    4    15      4      4
#5:    5    15      6      6
#6:    6    20     NA     NA

Upvotes: 1

DatamineR

Reputation: 9628

What about this?:

cbind(df, df[rep(cumsum(rle(df$Value)$lengths) + 1, rle(df$Value)$lengths),])
    Time Value Time Value
4      1    10    4    15
4.1    2    10    4    15
4.2    3    10    4    15
6      4    15    6    20
6.1    5    15    6    20
NA     6    20   NA    NA

Upvotes: 4

R Find Next Row Where Value Changes

Answers (3)

Related Questions