Reputation: 8584
I consistently need to take transaction data and aggregate it by Day, Week, Month, Quarter, Year - essentially, it's time-series data. I started to apply zoo
/xts
to my data in hopes I could aggregate the data faster, but I either don't fully understand the packages' purpose or I'm trying to apply it incorrectly.
In general, I would like to calculate the number of orders and the number of products ordered by category, by time period (day, week, month, etc).
#Create the data
clients <- 1:10
dates <- seq(as.Date("2012/1/1"), as.Date("2012/9/1"), "days")
categories <- LETTERS[1:5]
products <- data.frame(numProducts = 1:10,
category = sample(categories, 1000, replace = TRUE),
clientID = sample(clients, 1000, replace = TRUE),
OrderDate = sample(dates, 1000, replace = TRUE))
I could do this with plyr
and reshape
, but I think this is a round-about way to do so.
#Aggregate by date and category
products.day <- ddply(products, .(OrderDate, category), summarize, numOrders = length(numProducts), numProducts = sum(numProducts))
#Aggregate by Month and category
products.month <- ddply(products, .(Month = months(OrderDate), Category = category), summarize, numOrders = length(numProducts), numProducts = sum(numProducts))
#Make a wide-version of the data frame
products.month.wide <- cast(products.month, Month~Category, sum)
I tried to apply zoo
to the data like so:
products.TS <- aggregate(products$numProducts, yearmon, mean)
It returned this error:
Error in aggregate.data.frame(as.data.frame(x), ...) :
'by' must be a list
I've read the zoo
vignettes and documentation, but every example that I've found only shows 1 record/row/entry per time entry.
Do I have to pre-aggregate the data I want to time-series on? I was hoping that I could simply group by the fields I want, then have the months or quarters get added to the data frame incrementally to the X-axis.
Is there a better approach to aggregating this or a more appropriate package?
Upvotes: 2
Views: 456
Reputation: 176648
products$numProducts
is a vector, not a zoo object. You'd need to create a zoo object before you can use method dispatch to call aggregate.zoo
.
pz <- with(products, zoo(numProducts, OrderDate))
products.TS <- aggregate(pz, as.yearmon, mean)
Upvotes: 4