user189035
user189035

Reputation: 5789

Conditional time difference between rows. R & dplyr/data.table

So I have a data frame with two columns:

set.seed(123)
nrows   <- 100
my_data <- data.frame(side = sample(c(-1, 1), nrows, replace = TRUE),
                        value = 1:nrows)

One called value (that measures time) and one called side with two values in it (-1 and 1).

For every row, I need to know how much time (delta value) elapsed since the last occurrence of a row with side not equal to the current side. In slow R code:

slow_function <- function(my_data){
    stopifnot(!is.null(my_data$side))
    stopifnot(!is.null(my_data$value))

    value_past <- rep(NA, nrow(my_data))
    for(i in 2:nrow(my_data)){
            current_value <- -1 * my_data$side[i]
            last_mirror   <- rev(which(my_data$side[1:(i-1)] == current_value))[1]
            value_past[i] <- my_data$value[i] - my_data$value[last_mirror]
    }                       
    return(value_past)
}

my_data$res <- slow_function(my_data)

head(my_data)
#   side value res
# 1   -1     1  NA
# 2    1     2   1
# 3   -1     3   1
# 4    1     4   1
# 5    1     5   2
# 6   -1     6   1

I'm trying to do this fastly using either dplyr or data.table magic.

Upvotes: 2

Views: 118

Answers (1)

Frank
Frank

Reputation: 66819

A non-equi update join works:

library(data.table)
setDT(my_data)
my_data[, other_side := -1*side ]

my_data[, v :=
  .SD[.SD, on=.(side = other_side, value < value), mult="last", i.value - x.value ]
]

# test
my_data[ res != v, .N ] # 0

It's probably also possible to do this with a rolling join.

Upvotes: 3

Related Questions