Reputation: 4060
I am wishing to (arithmetically) average daily data and thus convert my daily time series into a weekly one.
Following this thread: How does one compute the mean of weekly data by column using R? , I am using the xts library.
# Averages daily time series into weekly time series
# where my source is a zoo object
source.w <- apply.weekly(source, colMeans)
The problem I am having is that it averages the series taking tuesday through next monday data.
I am searching for options to average my daily data from monday through friday.
Any hints?
Here is a bit more:
# here is part of my data, from a "blé colza.txt" file
24/07/2012 250.5 499
23/07/2012 264.75 518.25
20/07/2012 269.25 525.25
19/07/2012 267 522.5
18/07/2012 261.25 517
17/07/2012 265.75 522.25
16/07/2012 264.25 523.25
13/07/2012 258.25 517
12/07/2012 253.75 513
11/07/2012 246.25 512.75
10/07/2012 248 515
09/07/2012 247 519.25
06/07/2012 243.25 508.25
05/07/2012 245 508.5
04/07/2012 236 500.5
03/07/2012 234 497.75
02/07/2012 234.25 489.75
29/06/2012 229 490.25
28/06/2012 229.75 487.25
27/06/2012 229.75 493
26/06/2012 226.5 486
25/06/2012 220 482.25
22/06/2012 214.25 472.5
21/06/2012 212 469.5
20/06/2012 210.25 473.75
19/06/2012 208 472.75
18/06/2012 206.75 462.5
15/06/2012 203 456.5
14/06/2012 205.25 460.5
13/06/2012 205.25 465.25
12/06/2012 205.25 469
11/06/2012 208 471.5
08/06/2012 208 468.5
07/06/2012 208 471.25
06/06/2012 208 467
05/06/2012 208 458.75
04/06/2012 208 457.5
01/06/2012 208 463.5
31/05/2012 208 466.75
30/05/2012 208 468
29/05/2012 212.75 469.75
28/05/2012 212.75 469.75
25/05/2012 212.75 465.5
# Loads external libraries
library("zoo") # or require("zoo")
library("xts") # or require("xts")
# Loads data as a zoo object
source <- read.zoo("blé colza.txt", sep=",", dec=".", header=T, na.strings="NA", format="%d/%m/%Y")
# Averages daily time series into weekly time series
# https://stackoverflow.com/questions/11129562/how-does-one-compute-the-mean-of-weekly- data-by-column-using-r
source.w <- apply.weekly(source, colMeans)
Upvotes: 0
Views: 4641
Reputation: 4060
Looking again at my problem at hand.
It's straighforward using the xts library.
# say you have xts object name 'dat'
ep <- endpoints(dat, on = 'weeks') #
period.apply(x = dat, INDEX = ep, FUN = mean)
Upvotes: 1
Reputation: 10540
follow-up on Joshua Ulrich's answer.
On my system (kUbuntu 12), the following did not retrieve a zone.tab file
tzfile <- file.path(R.home("share"), "zoneinfo", "zone.tab")
However, I was able to find zone.tab by
locate zone.tab
For some reason (file permissions perhaps), I was not able to point to that zone.tab file directly, i.e. writing:
tzfile <- "usr/share/zoneinfo/zone.tab"
returned:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'usr/share/zoneinfo/zone.tab': No such file or directory
Problem solved after making a local copy of zone.tab and pointing to that copy:
tzfile <- "~/R/zone.tab"
Now if you Google for zone.tab, you'll find a copy of zone.tab online, in case your system doesn't have one or it's corrupted or whatever. Here is such a place:
http://www.ietf.org/timezones/data/zone.tab
P.S. I'm <15 so I can't post a comment, which is what I would otherwise have done.
Upvotes: 1
Reputation: 176728
mrdwab's answer only happens to work because they share a timezone (or a characteristic thereof) with the OP. To illustrate:
Lines <-
"24/07/2012 250.5 499
23/07/2012 264.75 518.25
20/07/2012 269.25 525.25
19/07/2012 267 522.5
18/07/2012 261.25 517
17/07/2012 265.75 522.25
16/07/2012 264.25 523.25
13/07/2012 258.25 517
12/07/2012 253.75 513
11/07/2012 246.25 512.75
10/07/2012 248 515
09/07/2012 247 519.25
06/07/2012 243.25 508.25
05/07/2012 245 508.5
04/07/2012 236 500.5
03/07/2012 234 497.75
02/07/2012 234.25 489.75
29/06/2012 229 490.25
28/06/2012 229.75 487.25
27/06/2012 229.75 493
26/06/2012 226.5 486
25/06/2012 220 482.25
22/06/2012 214.25 472.5
21/06/2012 212 469.5
20/06/2012 210.25 473.75
19/06/2012 208 472.75
18/06/2012 206.75 462.5
15/06/2012 203 456.5
14/06/2012 205.25 460.5
13/06/2012 205.25 465.25
12/06/2012 205.25 469
11/06/2012 208 471.5
08/06/2012 208 468.5
07/06/2012 208 471.25
06/06/2012 208 467
05/06/2012 208 458.75
04/06/2012 208 457.5
01/06/2012 208 463.5
31/05/2012 208 466.75
30/05/2012 208 468
29/05/2012 212.75 469.75
28/05/2012 212.75 469.75
25/05/2012 212.75 465.5"
# Get R's timezone information (from ?Sys.timezone)
tzfile <- file.path(R.home("share"), "zoneinfo", "zone.tab")
tzones <- read.delim(tzfile, row.names = NULL, header = FALSE,
col.names = c("country", "coords", "name", "comments"),
as.is = TRUE, fill = TRUE, comment.char = "#")
# Run the analysis on each timezone
out <- list()
library(xts)
for(i in seq_along(tzones$name)) {
tzn <- tzones$name[i]
Sys.setenv(TZ=tzn)
con <- textConnection(Lines)
Source <- read.zoo(con, format="%d/%m/%Y")
out[[tzn]] <- apply.weekly(Source, colMeans)
}
Now you can run head(out,5)
and see that some of the output differ based on the timezone used:
head(out,5)
$`Europe/Andorra`
V2 V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000
$`Asia/Dubai`
V2 V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000
$`Asia/Kabul`
V2 V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000
$`America/Antigua`
V2 V3
2012-05-25 212.750 465.500
2012-06-01 209.900 467.550
2012-06-08 208.000 464.600
2012-06-15 205.350 464.550
2012-06-22 210.250 470.200
2012-06-29 227.000 487.750
2012-07-06 238.500 500.950
2012-07-13 250.650 515.400
2012-07-20 265.500 522.050
2012-07-24 257.625 508.625
$`America/Anguilla`
V2 V3
2012-05-25 212.750 465.500
2012-06-01 209.900 467.550
2012-06-08 208.000 464.600
2012-06-15 205.350 464.550
2012-06-22 210.250 470.200
2012-06-29 227.000 487.750
2012-07-06 238.500 500.950
2012-07-13 250.650 515.400
2012-07-20 265.500 522.050
2012-07-24 257.625 508.625
The more robust solution is to ensure that your timezone is correctly represented, either by using Sys.setenv(TZ="<yourTZ>")
to set it globally or indexTZ(Source) <- "<yourTZ>"
to set it for each individual object.
Upvotes: 7
Reputation: 193687
I was able to reproduce your problem, and you can solve it using period.apply()
and custom "endpoints".
First, the data you provided, in a format that others can read in easily.
temp = structure(list(V1 = structure(c(33L, 32L, 29L, 27L, 25L, 23L,
22L, 19L, 17L, 15L, 13L, 12L, 9L, 7L, 5L, 3L, 2L, 41L, 39L, 37L,
36L, 35L, 31L, 30L, 28L, 26L, 24L, 21L, 20L, 18L, 16L, 14L, 11L,
10L, 8L, 6L, 4L, 1L, 43L, 42L, 40L, 38L, 34L), .Label = c("01/06/2012",
"02/07/2012", "03/07/2012", "04/06/2012", "04/07/2012", "05/06/2012",
"05/07/2012", "06/06/2012", "06/07/2012", "07/06/2012", "08/06/2012",
"09/07/2012", "10/07/2012", "11/06/2012", "11/07/2012", "12/06/2012",
"12/07/2012", "13/06/2012", "13/07/2012", "14/06/2012", "15/06/2012",
"16/07/2012", "17/07/2012", "18/06/2012", "18/07/2012", "19/06/2012",
"19/07/2012", "20/06/2012", "20/07/2012", "21/06/2012", "22/06/2012",
"23/07/2012", "24/07/2012", "25/05/2012", "25/06/2012", "26/06/2012",
"27/06/2012", "28/05/2012", "28/06/2012", "29/05/2012", "29/06/2012",
"30/05/2012", "31/05/2012"), class = "factor"), V2 = c(250.5,
264.75, 269.25, 267, 261.25, 265.75, 264.25, 258.25, 253.75,
246.25, 248, 247, 243.25, 245, 236, 234, 234.25, 229, 229.75,
229.75, 226.5, 220, 214.25, 212, 210.25, 208, 206.75, 203, 205.25,
205.25, 205.25, 208, 208, 208, 208, 208, 208, 208, 208, 208,
212.75, 212.75, 212.75), V3 = c(499, 518.25, 525.25, 522.5, 517,
522.25, 523.25, 517, 513, 512.75, 515, 519.25, 508.25, 508.5,
500.5, 497.75, 489.75, 490.25, 487.25, 493, 486, 482.25, 472.5,
469.5, 473.75, 472.75, 462.5, 456.5, 460.5, 465.25, 469, 471.5,
468.5, 471.25, 467, 458.75, 457.5, 463.5, 466.75, 468, 469.75,
469.75, 465.5)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA,
-43L))
We'll do come clean-up and convert the object to an xts
object.
temp$V1 = as.Date(temp$V1, format="%d/%m/%Y")
library(xts)
temp.x = xts(temp[-1], order.by=temp$V1)
Now. we try the apply.weekly()
function, but it doesn't give us what you want.
apply.weekly(temp.x, colMeans)
# V2 V3
# 2012-05-28 212.75 467.625
# 2012-06-04 208.95 465.100
# 2012-06-11 208.00 467.400
# 2012-06-18 205.10 462.750
# 2012-06-25 212.90 474.150
# 2012-07-02 229.85 489.250
# 2012-07-09 241.05 506.850
# 2012-07-16 254.10 516.200
# 2012-07-23 265.60 521.050
# 2012-07-24 250.50 499.000
To use period.apply()
you need to specify the endpoints of your periods (which can be irregular). Here, our first period is just the first date, and from there, it's every five days. That leaves a few days left over, so we add nrow(temp.x)
as the end of our final period.
ep = c(0, seq(1, nrow(temp.x), by = 5), nrow(temp.x))
period.apply(temp.x, INDEX = ep, FUN = colMeans)
# V2 V3
# 2012-05-25 212.750 465.500
# 2012-06-01 209.900 467.550
# 2012-06-08 208.000 464.600
# 2012-06-15 205.350 464.550
# 2012-06-22 210.250 470.200
# 2012-06-29 227.000 487.750
# 2012-07-06 238.500 500.950
# 2012-07-13 250.650 515.400
# 2012-07-20 265.500 522.050
# 2012-07-24 257.625 508.625
Upvotes: 3
Reputation: 1196
I ran your example and If I understand the problem correctly, the apply.weekly
function is aggregating the first friday with the first monday of your data. I don't use the xts
package, so someone else would have to provide more insight on that. I would convert the dates to a vector of dates with the date of the Monday of each week representing every observation of that week. ?strptime
summaries the codes I use for the conversions.
# Get the year of the first observation
start_year <- format(time(source)[1],"%Y")
# Convert this into a date for the 1st of Jan in that year.
start_date <- as.Date(strptime(paste(start_year, "1 1"), "%Y %d %m"))
# Using the difftime function determine the distance (days) since the first day of the first year.
jul_day <- as.numeric(difftime(time(source),start_date),units="days")
# Get the date of the Monday before each observation and add it to the start of the year.
mondays <- start_date + (jul_day - (jul_day-1)%%7)
# the %% calculates the remainder.
# to check that it has worked convert the mondays vector into day names.
format(mondays, "%A")
# And now you can aggregate the observations using the mondays vector.
source.w <- aggregate(source[,1:2], mondays, "mean")
Upvotes: 2