Yusri
Yusri

Reputation: 302

Averaging values in a data frame for a certain hour and month in R

I've been scouring the net but haven't found a solution to this quite possibly simple problem.

This is the half-hourly data using the library 'xts',

library(xts)
data.xts <- as.xts(1:nrow(data), as.POSIXct("2007-08-24 17:30:00") +
                  1800 * (1:nrow(data)))
data.xts <-as.data.frame(data.xts)

I changed it to data.frame because the original data is in data.frame format. Actually, in the original data frame, there is a time_stamp column and I prefer if I can just use the time_stamp column instead of using the 'xts' format.

How can I average every hourly data for a month so that I can plot a hourly time series of 24 hours for the different months?

For example,

 2007-08-24 17:30:00  1
 2007-08-25 17:00:00  47
 2007-08-25 17:30:00  48
 2007-08-26 17:00:00  95

would be averaged for the month of August 2007, etc.

Goal is to plot averaged 24-hourly time series for each month.

Thanks!

Upvotes: 0

Views: 1685

Answers (2)

jlhoward
jlhoward

Reputation: 59355

If I understand you correctly, you want to average all the values for a given hour, for all the days in a given month, and do this for all months. So average all the values between midnight and 00:59:59 for all the days in a given month, etc.

I see that you want to avoid xts but aggregate.zoo(...) was designed for this, and avoids dplyr and cut.

library(xts)
# creates sample dataset...
set.seed(1)
data <- rnorm(1000)
data.xts <- as.xts(data, as.POSIXct("2007-08-24 17:30:00") +
                     1800 * (1:length(data)))

# using aggregate.zoo(...)
as.hourly <- function(x) format(x,"%Y-%m %H")
result    <- aggregate(data.xts,by=as.hourly,mean)
result    <- data.frame(result)
head(result)
#                result
# 2007-08 00 0.12236024
# 2007-08 01 0.41593567
# 2007-08 02 0.22670817
# 2007-08 03 0.23402842
# 2007-08 04 0.22175078
# 2007-08 05 0.05081899

Upvotes: 1

akrun
akrun

Reputation: 887118

Try

library(dplyr)
res <- dat %>% 
           group_by(month=format(datetime, '%m'),
              #year=format(datetime, '%Y'), #if you need year also
              # as grouping variable
              hour=format(as.POSIXct(cut(datetime, breaks='hour')), '%H')) %>%
           summarise(Meanval=mean(val, na.rm=TRUE))   


 head(res,3)
 #  month hour     Meanval
 #1    01   00 -0.02780036
 #2    01   01 -0.06589948
 #3    01   02 -0.02166218

Update

If your datetime is POSIXlt you could convert it to POSIXct.

  dat$datetime <- as.POSIXlt(dat$datetime)

By running the above code, I get the error

   # Error: column 'datetime' has unsupported type

You could use mutate and convert the datetime to POSIXct class by as.POSIXct

  res1 <-  dat %>% 
               mutate(datetime= as.POSIXct(datetime)) %>%
               group_by(month=format(datetime, '%m'),
                 #year=format(datetime, '%Y'), #if you need year also
                 # as grouping variable
                  hour=format(as.POSIXct(cut(datetime, breaks='hour')), '%H')) %>%
               summarise(Meanval=mean(val, na.rm=TRUE))  

data

set.seed(24)
dat <- data.frame(datetime=seq(Sys.time(), by='1 hour', length.out=2000),
    val=rnorm(2000))

Upvotes: 1

Related Questions