Aggregate on two different variables

Question

I have a dataset with lots of observations that has 3 different fields:

Consumption, Date, Hour
100, 17/12/2010, 00
120, 17/12/2010, 01
105, 17/12/2010, 02
110, 17/12/2010, 03
...
140, 17/12/2010, 23
130, 18/12/2010, 00
145, 18/12/2010, 01
...

What I want to achieve is create a new dataset with these 3 fields:

Consumption, Date, Period
103.5, 17/12/2010, Night
104.9, 17/12/2010, Morning
104.9, 17/12/2010, Afternoon

so that all the observation of each day between hours 00-08 belong to Night, 08-16 to Morning and 16-23 to Afternoon. The field Consumption of my new dataset will store the mean of the values of observations between the corresponding hours. So finally, I want to have only 3 observations per day instead of 24 that I got now. I think I can achieve this using the aggregate() function but I don't know to use it correctly to achieve it. Am I looking at the right direction? How can this be done?

MrFlick · Accepted Answer

First, it would be helpful to have more complete sample data

set.seed(15)
dd<-data.frame(
    Consumption=rpois(10*12,100),
    Date=rep(seq(as.Date("2012-01-01"), as.Date("2012-01-10"), by="1 day"), each=12),
    Hour=rep(seq(1,24, by=2), 10)
)

the first step would be to create a grouping vairable for the hours. You can do that easily with cut()

dd$Part <- cut(dd$Hour, 
    breaks=c(0,8, 16, 23), 
    labels=c("Night","Morning","Afternoon"), 
    include.lowest=T)

Then you could use aggregate()

aggregate(Consumption~Part+Date,dd, mean)

which returns

        part       Date Consumption
1      Night 2012-01-01       94.75
2    Morning 2012-01-01      103.00
3  Afternoon 2012-01-01      101.75
4      Night 2012-01-02      105.50
5    Morning 2012-01-02       95.00
...

Aggregate on two different variables

Answers (1)

Related Questions