mikebmassey
mikebmassey

Reputation: 8584

R Zoo - aggregating many records with same time entry

I consistently need to take transaction data and aggregate it by Day, Week, Month, Quarter, Year - essentially, it's time-series data. I started to apply zoo/xts to my data in hopes I could aggregate the data faster, but I either don't fully understand the packages' purpose or I'm trying to apply it incorrectly.

In general, I would like to calculate the number of orders and the number of products ordered by category, by time period (day, week, month, etc).

#Create the data
clients <- 1:10
dates <- seq(as.Date("2012/1/1"), as.Date("2012/9/1"), "days")
categories <- LETTERS[1:5]
products <- data.frame(numProducts = 1:10, 
                       category = sample(categories, 1000, replace = TRUE),
                       clientID = sample(clients, 1000, replace = TRUE), 
                       OrderDate = sample(dates, 1000, replace = TRUE))

I could do this with plyr and reshape, but I think this is a round-about way to do so.

#Aggregate by date and category
products.day <- ddply(products, .(OrderDate, category), summarize, numOrders = length(numProducts), numProducts = sum(numProducts))

#Aggregate by Month and category
products.month <- ddply(products, .(Month = months(OrderDate), Category = category), summarize, numOrders = length(numProducts), numProducts = sum(numProducts))

#Make a wide-version of the data frame
products.month.wide <- cast(products.month, Month~Category, sum)

I tried to apply zoo to the data like so:

products.TS <- aggregate(products$numProducts, yearmon, mean) 

It returned this error:

Error in aggregate.data.frame(as.data.frame(x), ...) : 
  'by' must be a list

I've read the zoo vignettes and documentation, but every example that I've found only shows 1 record/row/entry per time entry.

Do I have to pre-aggregate the data I want to time-series on? I was hoping that I could simply group by the fields I want, then have the months or quarters get added to the data frame incrementally to the X-axis.

Is there a better approach to aggregating this or a more appropriate package?

Upvotes: 2

Views: 456

Answers (1)

Joshua Ulrich
Joshua Ulrich

Reputation: 176648

products$numProducts is a vector, not a zoo object. You'd need to create a zoo object before you can use method dispatch to call aggregate.zoo.

pz <- with(products, zoo(numProducts, OrderDate))
products.TS <- aggregate(pz, as.yearmon, mean)

Upvotes: 4

Related Questions