Reputation: 243
I'm having trouble when trying to calculate the average temperature by hour.
I have a data frame with date, time (hh:mm:ss p.m./a.m.)and temperature. What I need is to extract the mean temperature by hour in order to plot daily variation of temperature.
I'm new to R, but did a try with what I know: I first tried by transforming hours into numbers, then extracting the first two characters, and then to calculate the mean but it didn't work very well. Moreover I have so many files to analize that it would be much better to have something more automated and clean than the "solution" I found.
I believe it must be a better way to calculate averages by hours in R so I've been looking for the answer in other posts here. Unfortunately I couldn't find a clear answer regarding extracting statistics from time data.
My data looks like this
date hour temperature
1 28/12/2013 13:03:01 41.572
2 28/12/2013 13:08:01 46.059
3 28/12/2013 13:13:01 48.55
4 28/12/2013 13:18:01 49.546
5 28/12/2013 13:23:01 49.546
6 28/12/2013 13:28:01 49.546
7 28/12/2013 13:33:01 50.044
8 28/12/2013 13:38:01 50.542
9 28/12/2013 13:43:01 50.542
10 28/12/2013 13:48:01 51.04
11 28/12/2013 13:53:01 51.538
12 28/12/2013 13:58:01 51.538
13 28/12/2013 14:03:01 50.542
14 28/12/2013 14:08:01 51.04
15 28/12/2013 14:13:01 51.04
16 28/12/2013 14:18:01 52.534
17 28/12/2013 14:23:01 53.031
18 28/12/2013 14:28:01 53.031
19 28/12/2013 14:33:01 53.031
20 28/12/2013 14:38:01 51.538
21 28/12/2013 14:43:01 53.031
22 28/12/2013 14:48:01 53.529
etc (24hs data)
And I would like R to calculate average per hour (without taking into account differences in minutes or seconds, just by hour)
Any suggestion? Thank you very much in advance!
Regards, Maria
Upvotes: 7
Views: 24870
Reputation: 146
Try this example:
library(foqat)
met2=avri(met[,c(1,2)], bkip="1 hour", mode="ncycle", value=24)
#plot it
geom_avri(
met2,
cave=2, csd=3,
alpha=0.5, lcc="#0050b3", rff="#40a9ff",
xlab="Hour of day", ylab=bquote(Temp~" "~(degree*C))
)
Upvotes: 0
Reputation: 364677
Combine the date and hour columns into a POSIXct column and cut()
by hourly breaks:
df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
date hour temperature
28/12/2013 13:03:01 41.572
28/12/2013 13:08:01 46.059
28/12/2013 13:13:01 48.55
28/12/2013 13:18:01 49.546
28/12/2013 13:23:01 49.546
28/12/2013 13:28:01 49.546
28/12/2013 13:33:01 50.044
28/12/2013 13:38:01 50.542
28/12/2013 13:43:01 50.542
28/12/2013 13:48:01 51.04
28/12/2013 13:53:01 51.538
28/12/2013 13:58:01 51.538
28/12/2013 14:03:01 50.542
28/12/2013 14:08:01 51.04
28/12/2013 14:13:01 51.04
28/12/2013 14:18:01 52.534
28/12/2013 14:23:01 53.031
28/12/2013 14:28:01 53.031
28/12/2013 14:33:01 53.031
28/12/2013 14:38:01 51.538
28/12/2013 14:43:01 53.031
28/12/2013 14:48:01 53.529
28/12/2013 15:01:01 50.77")
df$datehour <- cut(as.POSIXct(paste(df$date, df$hour),
format="%d/%m/%Y %H:%M:%S"), breaks="hour")
head(df)
date hour temperature datehour
1 28/12/2013 13:03:01 41.572 2013-12-28 13:00:00
2 28/12/2013 13:08:01 46.059 2013-12-28 13:00:00
3 28/12/2013 13:13:01 48.550 2013-12-28 13:00:00
4 28/12/2013 13:18:01 49.546 2013-12-28 13:00:00
5 28/12/2013 13:23:01 49.546 2013-12-28 13:00:00
6 28/12/2013 13:28:01 49.546 2013-12-28 13:00:00
Now aggregate by that hourly column:
means <- aggregate(temperature ~ datehour, df, mean)
head(means)
datehour temperature
1 2013-12-28 13:00:00 49.17192
2 2013-12-28 14:00:00 52.23470
3 2013-12-28 15:00:00 50.77000
plot(as.POSIXct(means$datehour), means$temperature, type="l", las=1,
main="Hourly Avg Temperatures", xlab="Hour", ylab="")
But, for time series data, I like to use package xts:
require(xts)
df.xts <- xts(df$temperature, as.POSIXct(paste(df$date, df$hour),
format="%d/%m/%Y %H:%M:%S"))
head(df.xts)
[,1]
2013-12-28 13:03:01 41.572
2013-12-28 13:08:01 46.059
2013-12-28 13:13:01 48.550
2013-12-28 13:18:01 49.546
2013-12-28 13:23:01 49.546
2013-12-28 13:28:01 49.546
means <- period.apply(df.xts, endpoints(df.xts, "hours"), mean)
head(means)
[,1]
2013-12-28 13:58:01 49.17192
2013-12-28 14:48:01 52.23470
2013-12-28 15:01:01 50.77000
Notice how the timestamps are the last entry of each hour. We can align the timestamps (down) to the beginning of the hour with this function:
align.time.down = function(x,n){ index(x) = index(x)-n; align.time(x,n) }
means.rounded <- align.time.down(means, 60*60)
# 2nd argument is the number of seconds to adjust/round to,
# just like function align.time()
head(means.rounded)
[,1]
2013-12-28 13:00:00 49.17192
2013-12-28 14:00:00 52.23470
2013-12-28 15:00:00 50.77000
plot(means.rounded, las=1, main="Hourly Avg Temperatures")
Upvotes: 15
Reputation: 1022
It would always easier if sample data and expected output is given in the question.
Solution with Data.table package
require(data.table)
data <- fread('temp.csv',sep=',') #Assuming your data is in temp.csv
#if above step not executed, convert the data frame to data.table
data <- data.table(data)
> str(data)
Classes ‘data.table’ and 'data.frame': 12 obs. of 3 variables:
$ date : chr "28/12/2013" "28/12/2013" "28/12/2013" "28/12/2013" ...
$ hour : chr "13:03:01" "13:08:01" "13:13:01" "13:18:01" ...
$ temperature: num 41.6 46.1 48.5 49.5 49.5 ...
> data
date hour temperature avg
1: 27/12/2013 13:00:00 42.99 35.78455
2: 27/12/2013 14:00:00 65.97 35.78455
3: 27/12/2013 15:00:00 63.57 35.78455
data[,list(avg=mean(temperature)),by=hour] #dataset is sorted by hour
hour avg
1: 13:00:00 42.99
2: 14:00:00 65.97
3: 15:00:00 63.57
data[,list(avg=mean(temperature)),by="date,hour"] #data set is grouped by date,then hour
date hour avg
1: 27/12/2013 13:00:00 42.99
2: 27/12/2013 14:00:00 65.97
3: 27/12/2013 15:00:00 63.57
data[,list(avg=mean(temperature)),by=list(date,hour(as.POSIXct(data$hour, format = "%H:%M:%S")))] # to group by hour only
date hour avg
1: 27/12/2013 1 29.530
2: 27/12/2013 4 65.970
Upvotes: 3