user702432
user702432

Reputation: 12188

Aggregate by year and month for a POSIX variable

I have a dataset of the following form.

        country            datetime       x
1 United States 2008-01-01 00:00:00 5962.06
2 United States 2008-01-02 00:00:00 6002.74
3 United States 2008-01-03 00:00:00 6040.98
4 United States 2008-01-04 00:00:00 6031.44
5 United States 2008-01-05 00:00:00 6029.91
6 United States 2008-01-06 00:00:00 6025.24

For me time (hours, minutes, seconds) and days of the week are irrelevant, but I want to aggregate the values of variable "x" by country, year and month. Is there any straightforward way of doing this?

Upvotes: 2

Views: 6952

Answers (3)

Oscar Perpiñán
Oscar Perpiñán

Reputation: 4511

You can use zoo::as.yearmon:

 aggregate(x ~ country * as.yearmon(datetime), FUN=mean, data=dat)

 as.yearmon(datetime)       country        x
1             ene 2008 United States 6015.395

Upvotes: 1

IRTFM
IRTFM

Reputation: 263481

Given Andrie's better solution this will mainly be an exercise in POSIXlt illustration. Using the assumptions about the classes of your variables noted above and using mean as the aggregating function:

aggregate(dfrm$x, list(dfrm$country, as.POSIXlt(dfrm$datetime)$year, 
                       as.POSIXlt(dfrm$datetime)$mon), FUN=mean)
         Group.1 Group.2 Group.3        x
1  United States     108       0 6015.395

Note that one could add 1900 to the POSIXlt year value to recover a year and use the month value as an index into the R constant vector 'month.abb', and adding nice column labels:

aggregate(dfrm$x, list(Country=dfrm$country, 
                       Year=1900+as.POSIXlt(dfrm$datetime)$year, 
                       Month=month.abb[1+as.POSIXlt(dfrm$datetime)$mon]), 
FUN=mean)
         Country Year Month        x
1  United States 2008   Jan 6015.395

Upvotes: 1

Andrie
Andrie

Reputation: 179558

The easiest way is possibly to use strftime to format your datetime as a character vector that contains only the year and month.

Assuming your column datetime is of class POSIXct, and that your data.frame is called dat:

dat$shortdate <- strftime(dat$datetime, format="%Y/%m")
dat
        country   datetime       x shortdate
1 United States 2008-01-01 5962.06   2008/01
2 United States 2008-01-02 6002.74   2008/01
3 United States 2008-01-03 6040.98   2008/01
4 United States 2008-01-04 6031.44   2008/01
5 United States 2008-01-05 6029.91   2008/01
6 United States 2008-01-06 6025.24   2008/01

Then its a simple matter to use your favourite aggregation method to summarise the data. For example, using plyr:

library(plyr)
ddply(dat, .(shortdate), summarize, mean_x=mean(x))

  shortdate   mean_x
1   2008/01 6015.395

Upvotes: 4

Related Questions