Reputation: 41
this is a time series with hourly smart meter data and freq=24. It is measured over three days, so first day[1:24], second[25:48], third[49:72].
I want to have the mean for every hour over three days. For example:
(t[1]+t[25]+t[49])/3
so I can make a boxplot for 24 mean hours over 3 days.
x <- c(0.253, 0.132, 0.144, 0.272, 0.192, 0.132, 0.209, 0.255, 0.131,
0.136, 0.267, 0.166, 0.139, 0.238, 0.236, 1.75, 0.32, 0.687,
0.528, 1.198, 1.961, 1.171, 0.498, 1.28, 2.267, 2.605, 2.776,
4.359, 3.062, 2.264, 1.212, 1.809, 2.536, 2.48, 0.531, 0.515,
0.61, 0.867, 0.804, 2.282, 3.016, 0.998, 2.332, 0.612, 0.785,
1.292, 2.057, 0.396, 0.455, 0.283, 0.131, 0.147, 0.272, 0.198,
0.13, 0.19, 0.257, 0.149, 0.134, 0.251, 0.215, 0.133, 1.755,
1.855, 1.938, 1.471, 0.528, 0.842, 0.223, 0.256, 0.239, 0.113)
Upvotes: 3
Views: 3079
Reputation: 60984
Because you did not post an easy to use set of example data, let's first generate some:
time_series = runif(72)
The next step would be to change the structure of the dataset from a 1d vector, to a 2d matrix, this saves you a lot of having to deal with indices and such:
time_matrix = matrix(time_series, 24, 3)
and use apply
to calculate the hourly means (if you like apply
, take a look at the plyr
package for more nice functions, see this link for more detail):
hourly_means = apply(time_matrix, 1, mean)
> hourly_means
[1] 0.2954238 0.6791355 0.6113670 0.5775792 0.3614329 0.4414882 0.6206761
[8] 0.2079882 0.6238492 0.4069143 0.6333607 0.5254185 0.6685191 0.3629751
[15] 0.3715500 0.2637383 0.2730713 0.3170541 0.6053016 0.6550780 0.4031117
[22] 0.6857810 0.4492246 0.4795785
However, if you use ggplot2
there is no need to precalculate the boxplots, ggplot2
does this for you:
require(ggplot2)
require(reshape2)
# Notice the use of melt to reshape the dataset a bit
# Also notice the factor to transform Var1 to a categorical dataset
ggplot(aes(x = factor(Var1), y = value),
data = melt(time_matrix)) +
geom_boxplot()
Which yields, what I think, you where after:
On the x-axis the hours of the day, on the y axis the value.
Note: the data you have is a timeseries. R has specific ways of dealing with timeseries, e.g. the ts
function. I normally use ordinary R data objects (array's, matrices), but you could take a look at the TimeSeries CRAN taskview for an overview of what R can do with timeseries.
To calculate the hourly means using a ts
object (inspired by this SO post):
# Create a ts object
time_ts = ts(time_series, frequency = 24)
# Calculate the mean
> tapply(time_ts, cycle(time_ts), mean)
1 2 3 4 5 6 7 8
0.2954238 0.6791355 0.6113670 0.5775792 0.3614329 0.4414882 0.6206761 0.2079882
9 10 11 12 13 14 15 16
0.6238492 0.4069143 0.6333607 0.5254185 0.6685191 0.3629751 0.3715500 0.2637383
17 18 19 20 21 22 23 24
0.2730713 0.3170541 0.6053016 0.6550780 0.4031117 0.6857810 0.4492246 0.4795785
> aggregate(as.numeric(time_ts), list(hour = cycle(time_ts)), mean)
hour x
1 1 0.2954238
2 2 0.6791355
3 3 0.6113670
4 4 0.5775792
....
Upvotes: 10
Reputation: 176728
You can do this easily with the boxplot
function that comes with a basic R installation. Just create a data.frame with your original series and an index to identify the hour of each day.
Data <- data.frame(series=x, time=rep(1:24,3))
boxplot(series ~ time, data=Data)
Upvotes: 3