Reputation: 12188
I have a dataset of the following form.
country datetime x
1 United States 2008-01-01 00:00:00 5962.06
2 United States 2008-01-02 00:00:00 6002.74
3 United States 2008-01-03 00:00:00 6040.98
4 United States 2008-01-04 00:00:00 6031.44
5 United States 2008-01-05 00:00:00 6029.91
6 United States 2008-01-06 00:00:00 6025.24
For me time (hours, minutes, seconds) and days of the week are irrelevant, but I want to aggregate the values of variable "x" by country
, year and month. Is there any straightforward way of doing this?
Upvotes: 2
Views: 6952
Reputation: 4511
You can use zoo::as.yearmon
:
aggregate(x ~ country * as.yearmon(datetime), FUN=mean, data=dat)
as.yearmon(datetime) country x
1 ene 2008 United States 6015.395
Upvotes: 1
Reputation: 263481
Given Andrie's better solution this will mainly be an exercise in POSIXlt illustration. Using the assumptions about the classes of your variables noted above and using mean
as the aggregating function:
aggregate(dfrm$x, list(dfrm$country, as.POSIXlt(dfrm$datetime)$year,
as.POSIXlt(dfrm$datetime)$mon), FUN=mean)
Group.1 Group.2 Group.3 x
1 United States 108 0 6015.395
Note that one could add 1900 to the POSIXlt year value to recover a year and use the month value as an index into the R constant vector 'month.abb', and adding nice column labels:
aggregate(dfrm$x, list(Country=dfrm$country,
Year=1900+as.POSIXlt(dfrm$datetime)$year,
Month=month.abb[1+as.POSIXlt(dfrm$datetime)$mon]),
FUN=mean)
Country Year Month x
1 United States 2008 Jan 6015.395
Upvotes: 1
Reputation: 179558
The easiest way is possibly to use strftime
to format your datetime
as a character vector that contains only the year and month.
Assuming your column datetime
is of class POSIXct
, and that your data.frame
is called dat
:
dat$shortdate <- strftime(dat$datetime, format="%Y/%m")
dat
country datetime x shortdate
1 United States 2008-01-01 5962.06 2008/01
2 United States 2008-01-02 6002.74 2008/01
3 United States 2008-01-03 6040.98 2008/01
4 United States 2008-01-04 6031.44 2008/01
5 United States 2008-01-05 6029.91 2008/01
6 United States 2008-01-06 6025.24 2008/01
Then its a simple matter to use your favourite aggregation method to summarise the data. For example, using plyr
:
library(plyr)
ddply(dat, .(shortdate), summarize, mean_x=mean(x))
shortdate mean_x
1 2008/01 6015.395
Upvotes: 4