Reputation: 68486

R zoo object time series aggregation

I have an R zoo object. The zoo object (z) is indexed by date and has multiple columns:

V1 (aggregate value is the sum of all values in 'selected' rows)
V2 (aggregate value is q1 [first quartile] of all values in 'selected' rows)
V3 (aggregate value is the minima of all values in 'selected' rows)
V4 (aggregate value is the first value of all values in 'selected' rows)
v5 (aggregate value is the last value of all values in 'selected' rows)

I want to aggregate the data in each 'column' differently (i.e. using different functions), but aggregating over the same number of rows.

I want to aggregate using a function that allows me to specify the number of rows over which to aggregate. For example:

my_aggregate <- function(data, agg_rowcount) {
  # aggregate data over [agg_rowcount] rows....
  return (aggregated_data)
}

I initially thought of implementing this function by using the aptly named aggregate() function - but I could not get it to do what I wanted.

A simple example explaining the error I was getting using aggregate() is follows:

> indices <- seq.Date(as.Date('2000-01-01'),as.Date('2000-01-30'),by="day")
> a <- zoo(rnorm(30), order.by=indices)
> b <- zoo(rnorm(30), order.by=indices)
> c <- zoo(rnorm(30), order.by=indices)
> d <- merge(a,b)
> e <- merge(d,c)
> head(e)
                     a          b           c
2000-01-01 -0.07924078  0.6208785 -1.79826472
2000-01-02  1.15956208  1.1867218 -0.02124817
2000-01-03  0.20427523  0.3164863 -0.20153631
2000-01-04  1.21583902 -1.3728278  1.75872854
2000-01-05 -0.32845708  0.3857658 -1.01082787
2000-01-06 -1.95312879 -0.3824591 -1.33220075
>
> aggregate(e,by=e[[1]], nfrequency=8)
Error: length(time(x)) == length(by[[1]]) is not TRUE

So I failed at the very first hurdle. I would appreciate any help in helping me write the function that allows me to aggregate different columns differently, accross the same number of rows.

Note: I am only into my first few days of 'messing around' with R. For all I know, aggregate() may not be the way to solve this problem - I don't want the snippet of the code above to be a red herring, and receive answers on how to fix the problem I was getting when using the aggregate function - IF aggregate() is not the "best" (i.e. recommended R) way to approach this problem.

The only reasons why I included my attempt above are:

Because I was asked to post a 'reproducable' error
To show that I had tried to solve it myself first, before asking in here.

Upvotes: 1

Answers (2)

Omar Wagih

Reputation: 8744

Wouldn't the ddply function in the plyr package help here?

To aggregate by more than one column:

names(e)[1] = 'group'
agg = ddply(e, c("group"), function(df) { 
    c( sum(df$a), mean(df$b), tail(df$c) ) 
})
names(agg) = c('group', 'a', 'b', 'c')

Upvotes: 0

G. Grothendieck

Reputation: 270348

Suppose we wish to aggregate e by week, w, aggregating column a using sum, b using mean and c using the last value in the week:

w <- as.numeric(format(time(e), "%W"))
e.w <- with(e, cbind(a = aggregate(a, w, sum), 
    b = aggregate(b, w, mean), 
    c = aggregate(c, w, tail, 1)
))

Upvotes: 3

R zoo object time series aggregation

Answers (2)

Related Questions