Homunculus Reticulli
Homunculus Reticulli

Reputation: 68426

Aggregating a timestamped zoo object by clock time (i.e. not solely by time in the zoo object)

I have a zoo object which consists of a timestamped (to the second) timeseries. The timeseries is irregular in that the time intervals between the values are not regularly spaced.

I would like to transform the irregularly spaced timeseries object into a regularly spaced one, where the time intervals between values is a constant - say 15 minutes, and are "real world" clock times.

Some sample data may help illustrate further

# Sample data
2011-05-05 09:30:04 101.32
2011-05-05 09:30:14 100.09
2011-05-05 09:30:19 99.89
2011-05-05 09:30:35 89.66
2011-05-05 09:30:45 95.16
2011-05-05 09:31:12 100.28
2011-05-05 09:31:50 100.28
2011-05-05 09:32:10 98.28

I'd like to aggregate them (using my custom function) for every specified time period (e.g. 30 second time bucket) such that the output looks like the table presented below.

The key is that I want to aggregate every 30 seconds by clock time NOT 30 seconds starting from my first observation time. Naturally, the first time bucket would be the first time bucket for which I have a recorded observation (i.e. row) in the data to be aggregated.

2011-05-05 09:30:00   101.32
2011-05-05 09:30:30   89.66
2011-05-05 09:31:00   100.28

In the example given, my custom aggregate function simply returns the first value in the 'set' of 'selected rows' to aggregate over.

Upvotes: 3

Views: 1260

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 269596

Read in the data and then aggregate it by minute:

Lines <- "2011-05-05 09:30:04 101.32
2011-05-05 09:30:14 100.09
2011-05-05 09:30:19 99.89
2011-05-05 09:30:35 89.66
2011-05-05 09:30:45 95.16
2011-05-05 09:31:12 100.28
2011-05-05 09:31:50 100.28
2011-05-05 09:32:10 98.28"

library(zoo)
library(chron)
toChron <- function(d, t) as.chron(paste(d, t))
z <- read.zoo(text = Lines, index = 1:2, FUN = toChron)
aggregate(z, trunc(time(z), "00:01:00"), mean)

The result is:

(05/05/11 09:30:00) (05/05/11 09:31:00) (05/05/11 09:32:00) 
             97.224             100.280              98.280 

Upvotes: 5

IRTFM
IRTFM

Reputation: 263342

I hope we can assume this is in a zoo or xts object. If so then try this:

  # First get a start for a set of intervals, need to use your tz
beg<- as.POSIXct( format(index(dat[1,]), "%Y-%m-%d %H:%M", tz="EST5EDT"))
  # Then create a sequence of 30 second intervals
tseq <- beg+seq(0,4*30, by=30)
  # Then this will creat a vector than you can use for your aggregation fun
findInterval(index(dat), tseq)
  #[1] 1 1 1 2 2 3 4 5
  # To find the first row in a subset of rows from tapply, try "[" with 1
tapply(dat, findInterval(index(dat), tseq), "[", 1)
  #     1      2      3      4      5 
  #101.32  89.66 100.28 100.28  98.28 

Upvotes: 2

Pierre Lapointe
Pierre Lapointe

Reputation: 16277

You should look at align.time in xts. It does something very close to what you want to achieve.

my.data <- read.table(text="date,x
2011-05-05 09:30:04,101.32
2011-05-05 09:30:14,100.09
2011-05-05 09:30:19,99.89
2011-05-05 09:30:35,89.66
2011-05-05 09:30:45,95.16
2011-05-05 09:31:12,100.28
2011-05-05 09:31:50,100.28
2011-05-05 09:32:10,98.28", header=TRUE, as.is=TRUE,sep = ",")

my.data <- xts(my.data[,2],as.POSIXlt(my.data[,1],format="%Y-%m-%d %H:%M:%S"))

library(xts)
res <-align.time(my.data,30)
res[!duplicated(index(res)),]

                      [,1]
2011-05-05 09:30:30 101.32
2011-05-05 09:31:00  89.66
2011-05-05 09:31:30 100.28
2011-05-05 09:32:00 100.28
2011-05-05 09:32:30  98.28

You can lag the time series by 30 seconds if it makes the interpretation clearer.

Upvotes: 0

Simon Urbanek
Simon Urbanek

Reputation: 13932

I would simply truncate the times towards your interval, so assuming t is the time (use as.POSIXct if it's not)

bucket = t - as.numeric(t) %% 30

then you can aggregate over bucket, like aggregate(value, list(bucket), sum)

(I don't use zoo so this is with pure R)

Upvotes: 1

Related Questions