Reputation: 5789
So I have a data frame with two columns:
set.seed(123)
nrows <- 100
my_data <- data.frame(side = sample(c(-1, 1), nrows, replace = TRUE),
value = 1:nrows)
One called value (that measures time) and one called side with two values in it (-1 and 1).
For every row, I need to know how much time (delta value) elapsed since the last occurrence of a row with side not equal to the current side. In slow R code:
slow_function <- function(my_data){
stopifnot(!is.null(my_data$side))
stopifnot(!is.null(my_data$value))
value_past <- rep(NA, nrow(my_data))
for(i in 2:nrow(my_data)){
current_value <- -1 * my_data$side[i]
last_mirror <- rev(which(my_data$side[1:(i-1)] == current_value))[1]
value_past[i] <- my_data$value[i] - my_data$value[last_mirror]
}
return(value_past)
}
my_data$res <- slow_function(my_data)
head(my_data)
# side value res
# 1 -1 1 NA
# 2 1 2 1
# 3 -1 3 1
# 4 1 4 1
# 5 1 5 2
# 6 -1 6 1
I'm trying to do this fastly using either dplyr or data.table magic.
Upvotes: 2
Views: 118
Reputation: 66819
A non-equi update join works:
library(data.table)
setDT(my_data)
my_data[, other_side := -1*side ]
my_data[, v :=
.SD[.SD, on=.(side = other_side, value < value), mult="last", i.value - x.value ]
]
# test
my_data[ res != v, .N ] # 0
It's probably also possible to do this with a rolling join.
Upvotes: 3