user2507320
user2507320

Reputation: 189

R Find Next Row Where Value Changes

Let's say I have a data table (or frame) like this:

Time    Value
1            10
2            10
3            10
4            15
5            15
6            20

I'd like to attach columns that mark the time and value of the next change in Value. It would look like this:

Time    Value    T_Next    V_Next
1            10         4              15
2            10         4              15 
3            10         4              15 
4            15         6              20
5            15         6              20
6            20         NA          NA

I don't care too much about the last row. The only way I can think to do this is with some nested for loops, which would be slow and I have a large dataset. I'm sure there's a more R-style way to do it which will be more performant. Any help appreciated!

Upvotes: 2

Views: 1792

Answers (3)

ira
ira

Reputation: 2644

I came across this old question while searching for how to find first row in data.table where value changes compared to previous row. Since the other data.table answer suggest results that are different from what the question requested, here is another data.table option:

# create a sample data table
dt = data.table('Time' = c(1,2,3,4,5,6), 'Value' = c(10,10,10,15,15,20))

# Add new columns storing Value and Time if Value is different from preceding row Value
dt[
  Value != shift(Value, 1),
  `:=`(shifted_time = Time, shifted_value = Value)]


# back-fill missing values with next (non-na) observation caried backward    
dt[, `:=`(
  shifted_time = nafill(shifted_time, 'nocb'),
  shifted_value = nafill(shifted_value, 'nocb'))]

# make sure that shifted_value is not equal value    
dt[Value != shift(Value, 1),
     `:=`(
       shifted_time = shift(Time, -1),
       shifted_value = shift(Value, -1))]

Which results in:

dt
   Time Value shifted_time shifted_value
1:    1    10            4            15
2:    2    10            4            15
3:    3    10            4            15
4:    4    15            6            20
5:    5    15            6            20
6:    6    20           NA            NA

Upvotes: 1

akrun
akrun

Reputation: 886938

Here is another option with data.table. Convert the 'data.frame' to 'data.table', grouped by the lag of rleid of 'Value', we find the max of the 'Time' and 'Value' to create two columns 'T_next' and 'V_next'. Then, change the last row to NA.

library(data.table)
setDT(df1)[,  paste(c("T", "V"), "next", sep="_") := lapply(.SD, max), 
   by = .(shift(rleid(Value), fill = 1))][.N, c("T_next", "V_next") := rep(list(NA),2)][]
#   Time Value T_next V_next
#1:    1    10      4      4
#2:    2    10      4      4
#3:    3    10      4      4
#4:    4    15      4      4
#5:    5    15      6      6
#6:    6    20     NA     NA

Upvotes: 1

DatamineR
DatamineR

Reputation: 9618

What about this?:

cbind(df, df[rep(cumsum(rle(df$Value)$lengths) + 1, rle(df$Value)$lengths),])
    Time Value Time Value
4      1    10    4    15
4.1    2    10    4    15
4.2    3    10    4    15
6      4    15    6    20
6.1    5    15    6    20
NA     6    20   NA    NA

Upvotes: 4

Related Questions