Reputation: 3449
I have irregular time series data frame with time
(seconds) and value
columns. I want to add another column, value_2
where values are lead by delay
seconds. So value_2
at time t
equals to value
at time t + delay
or right after that.
ts=data.frame(
time=c(1,2,3,5,8,10,11,15,20,23),
value=c(1,2,3,4,5,6,7,8,9,10)
)
ts_with_delayed_value <- add_delayed_value(ts, "value", 2, "time")
> ts_with_delayed_value
time value value_2
1 1 1 3
2 2 2 4
3 3 3 4
4 5 4 5
5 8 5 6
6 10 6 8
7 11 7 8
8 15 8 9
9 20 9 10
10 23 10 10
I have my own version of this function add_delayed_value
, here it is:
add_delayed_value <- function(data, colname, delay, colname_time) {
colname_delayed <- paste(colname, sprintf("%d", delay), sep="_")
data[colname_delayed] <- NaN
for (i in 1:nrow(data)) {
time_delayed <- data[i, colname_time] + delay
value_delayed <- data[data[colname_time] >= time_delayed, colname][1]
if (is.na(value_delayed)) {
value_delayed <- data[i, colname]
}
data[i, colname_delayed] <- value_delayed
}
return(data)
}
Is there a way to vectorize this routine to avoid the slow loop?
I'm quite new to R, so this code probably has lots of issues. What can be improved about it?
Upvotes: 1
Views: 776
Reputation: 1369
collapse::flag
supports fast lagging of irregular time series and panels, see also my answer here. To get your exact result, you would have to fill the missing values introduced by flag
with a function such as data.table::nafill
with option "locf"
. The combination of these two functions is likely going to be the most parsimonious and efficient solution - compared to what was suggested previously.
Upvotes: 0
Reputation: 14360
What you want is not clear, give a pseudo code or a formula. It looks like this is what you want... From what I understand from you the last value should be NA
library(data.table)
setDT(ts,key='time')
ts_delayed = ts[,.(time_delayed=time+2)]
setkey(ts_delayed,time_delayed)
ts[ts_delayed,roll=-Inf]
Upvotes: 1
Reputation: 4024
This should work for your data. If you want to make a general function, you'll have to play around with lazyeval, which honestly might not be worth it.
library(dplyr)
library(zoo)
carry_back = . %>% na.locf(na.rm = TRUE, fromLast = FALSE)
data_frame(time =
with(ts,
seq(first(time),
last(time) ) ) ) %>%
left_join(ts) %>%
transmute(value_2 = carry_back(value),
time = time - delay) %>%
right_join(ts) %>%
mutate(value_2 =
value_2 %>%
is.na %>%
ifelse(last(value), value_2) )
Upvotes: 0
Reputation: 9618
You could try:
library(dplyr)
library(zoo)
na.locf(ts$value[sapply(ts$time, function(x) min(which(ts$time - x >=2 )))])
[1] 3 4 4 5 6 8 8 9 10 10
Upvotes: 2