ricardo
ricardo

Reputation: 8425

sub-setting and aggregating xts objects

I have an xts object that contains prices and volumes. I would like to split the data by time period and aggregate the data, such that I obtain a table of the volume traded at each price, for each of the time-period splits.

I have hacked together a partial solution, but it is clunky and slow; and moreover I cannot get it to work using templated names (so I cannot make my function work).

A subset of the xts object appears below:

library(xts)
mn <- 
structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89, 
97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89, 97.9, 97.89, 
97.89, 97.89, 97.89, 9, 60, 71, 5, 3, 21, 5, 192, 65, 73, 1, 
1, 39, 15, 1, 1, 18, 1, 33, 1, 1), .Dim = c(21L, 2L), .Dimnames = list(
    NULL, c("px_ym1", "vol_ym1")), index = structure(c(1338561000, 
1338561000, 1338561000, 1338561000, 1338561000, 1338561000, 1338561060, 
1338561060, 1338561060, 1338561060, 1338561060, 1338561060, 1338561060, 
1338561060, 1338561060, 1338561060, 1338561060, 1338561060, 1338561120, 
1338561240, 1338561240), tzone = "", tclass = c("POSIXct", "POSIXt"
)), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", 
"POSIXt"), .indexTZ = "", tzone = "", class = c("xts", "zoo"))

I have figured out how to aggregate using the following command:

PxMat <- aggregate(.~px_ym1, data=mn, sum)
  px_ym1 vol_ym1
1  97.89     408
2  97.90     208

and for subsets, I use the following grouping:

PxMat2 <- aggregate(.~px_ym1, data=mn[.indexmin(mn) == '30'], sum)
  px_ym1 vol_ym1
1   97.9     169

I have four specific questions:

1/ My data-sets are HUGE, so speed is an issue. Is this the fastest way to sub-set and aggregate an xts?

2/ is there a way to do this for multiple splits? Say a table for each minute, day, week, or month?

I am writing a function with loops that would generate a table that reports a volume traded at each price for each case of a specified time split (say for four tables, one for each minute in the example data). It seems slow, and like something that someone would have done better already. Is this the case?

3/ Is there a way to do the sub-setting with templated variable? I would like to obtain the table I get with the named aggregate functions (reproduced above) with multiple data frames, as the column names will differ from time to time.

I have tried:

PxMat <- aggregate(.~mn[,1], data=mn, sum)
  px_ym1  px_ym1 vol_ym1
1  97.90 1076.79     408
2  97.89  979.00     208

This is not a disaster, but if there are multiple columns I would like to keep the mess to a minimum. I cannot figure out how to suppress the aggregation of the price series.

4/ on a related note, is it possible to apply different functions to different columns of data? It would be nice, for example, if the table returned was:

px_ym1  count vol_ym1
1  97.90  11     408
2  97.89  10     208

Cross-posted on R-help: https://stat.ethz.ch/pipermail/r-help/2012-June/315499.html

Upvotes: 0

Views: 1050

Answers (1)

IRTFM
IRTFM

Reputation: 263301

I suggested by email to you (and to rhelp although it has not yet appeared ) that you try:

adf <- aggregate(vol_ym1 ~ px_ym1, data=mm, sum)

And that if you wanted to do this anonymously and yet remove the sum of the aggregating column that you could use:

adf <- aggregate(mm[,-1]~mm[,1], data=mm, sum); adf

If you want different columns, then you need to create a function that will operate on the full vector or matrix that is segregated by the split criteria. You will need a richer example to get a specific answer.

Upvotes: 1

Related Questions