stevejb
stevejb

Reputation: 2444

how to aggregate this data in R

I have a data frame in R with the following structure.

> testData
            date exch.code comm.code     oi
1     1997-12-30       CBT         1 468710
2     1997-12-23       CBT         1 457165
3     1997-12-19       CBT         1 461520
4     1997-12-16       CBT         1 444190
5     1997-12-09       CBT         1 446190
6     1997-12-02       CBT         1 443085
....
    77827 2004-10-26      NYME       967  10038
    77828 2004-10-19      NYME       967   9910
    77829 2004-10-12      NYME       967  10195
    77830 2004-09-28      NYME       967   9970
    77831 2004-08-31      NYME       967   9155
    77832 2004-08-24      NYME       967   8655

What I want to do is produce a table the shows for a given date and commodity the total oi across every exchange code. So, the rows would be made up of

unique(testData$date)

and the columns would be

unique(testData$comm.code)

and each cell would be the total oi over all exch.codes on a given day.

Thanks,

Upvotes: 6

Views: 3087

Answers (3)

mnel
mnel

Reputation: 115465

A data.table solution

library(data.table)
DT <- data.table(testData)
DT[,sum(oi), by = list(date,comm.code)]

Upvotes: 5

John
John

Reputation: 23758

# get it all aggregated
dfl <- aggregate(oi ~ date + comm.code, testData, sum)

# rearrange it so that it's like you requested
uc <- unique(df1$comm.code)
dfw <- with( df1, data.frame(data = unique(date), matrix(oi, ncol = length(uc))) )
names(dfw) <- c( 'date', uc)

This will be much much faster than the equivalent plyr command. And, there are ways to rearrange it in one liners. The rearranging part is very fast.

Upvotes: 10

Dirk is no longer here
Dirk is no longer here

Reputation: 368499

The plyr package is good at this, and you should get this done with a single ddply() call. Something like (untested)

ddply(testData, .(date,comm.code), function(x) sum(x$oi))

should work.

Upvotes: 11

Related Questions