bmonger
bmonger

Reputation: 77

Comparing time series with different timestamps in R

The problem I have is as follows:

How can we check what values were observed at a certain point of time across all processes? How can we produce periodical aggregates(e.g. a mean daily value) based on known information?

Here is a mockup of the data:

tsVector1 <- c("2020-01-01 16:18", "2020-01-03 12:12", "2020-01-03 14:56", "2020-01-10 09:48", "2020-01-11 23:00")
tsVector2 <- c("2020-01-01 16:18", "2020-01-02 14:10", "2020-01-04 18:36", "2020-01-06 19:23", "2020-01-10 21:10")
tsVector3 <- c("2020-01-01 16:18", "2020-01-05 13:37", "2020-01-08 01:11")
obsVector1 <- c(1, 1.2, 1.6, 3, 3.8)
obsVector2 <- c(1, 1.4, 1.7, 1.8, 2.1)
obsVector3 <- c(1:3)
procVector1 <- rep("proc1", 5)
procVector2 <- rep("proc2", 5)
procVector3 <- rep("proc3", 3)
tsDF <- data.frame(ts = as.POSIXct(c(tsVector1, tsVector2, tsVector3)),
                   observation = c(obsVector1, obsVector2, obsVector3), 
                   process = c(procVector1, procVector2, procVector3))

I am looking for an output like this:

SomeHelpfulFunction(tsDF, "2020-01-07 01:00")
>  process  observation
> process1  1.6
> process2  1.8
> process3  2.0

Is there a way to get this data without resorting to loops or replication of all unique timestamps across all of the time series being compared?


UPDATE: Having had a look at the replies and testing them out with a bigger dataset, I found that the data.table method works best for my use case and also results in faster response times. To give you an idea, the times I have measured were:

Upvotes: 0

Views: 571

Answers (3)

bmonger
bmonger

Reputation: 77

I have found an additional way of achieving the ruslt with base R alone. But it relies on duplicated() and is only good for getting the last reading, so will not really work with applications where with last N readings are required:

SomeHelpfulFunction <- function(tsDF, time) {
    functionLocalDF <- tsDF[order(tsDF$process, tsDF$ts), ]
    functionLocalDF <- functionLocalDF[functionLocalDF$ ts <= time, ]
    functionLocalDF[!duplicated(functionLocalDF$process, fromLast = T), ]
}

Upvotes: 0

Vincent
Vincent

Reputation: 17725

Here's a cute solution using a rolling join in data.table:

library(data.table)

SomeHelpfulFunction <- function(tsDF, time) {
  a = data.table(tsDF,
                 key=c("process", "ts"))
  b = data.table(process = unique(tsDF$process),
                 ts = as.POSIXct(time),
                 key = c("process", "ts"))
  a[b, roll=TRUE]
}

SomeHelpfulFunction(tsDF, "2020-01-07 1:00")
#>                     ts observation process
#> 1: 2020-01-07 01:00:00         1.6   proc1
#> 2: 2020-01-07 01:00:00         1.8   proc2
#> 3: 2020-01-07 01:00:00         2.0   proc3

I found this blog post useful in explaining rolling joins: https://r-norberg.blogspot.com/2016/06/understanding-datatable-rolling-joins.html

Upvotes: 1

Bas
Bas

Reputation: 4658

I think you want the last value before the time you are querying, for every process. With dplyr:

library(dplyr)

tsDF %>% 
  group_by(process) %>% 
  arrange(ts) %>% 
  filter(ts <= "2020-01-07 01:00") %>% 
  slice_tail() %>% 
  ungroup() %>% 
  select(-ts)

which gives

# A tibble: 3 x 2
  observation process
        <dbl> <fct>  
1         1.6 proc1  
2         1.8 proc2  
3         2   proc3  

Upvotes: 1

Related Questions