tags
tags

Reputation: 4060

How to convert a daily times series into an averaged weekly?

I am wishing to (arithmetically) average daily data and thus convert my daily time series into a weekly one.

Following this thread: How does one compute the mean of weekly data by column using R? , I am using the xts library.

# Averages daily time series into weekly time series
# where my source is a zoo object
source.w <- apply.weekly(source, colMeans)

The problem I am having is that it averages the series taking tuesday through next monday data.

I am searching for options to average my daily data from monday through friday.

Any hints?

Here is a bit more:

    # here is part of my data, from a "blé colza.txt" file


    24/07/2012  250.5   499
    23/07/2012  264.75  518.25
    20/07/2012  269.25  525.25
    19/07/2012  267 522.5
    18/07/2012  261.25  517
    17/07/2012  265.75  522.25
    16/07/2012  264.25  523.25
    13/07/2012  258.25  517
    12/07/2012  253.75  513
    11/07/2012  246.25  512.75
    10/07/2012  248 515
    09/07/2012  247 519.25
    06/07/2012  243.25  508.25
    05/07/2012  245 508.5
    04/07/2012  236 500.5
    03/07/2012  234 497.75
    02/07/2012  234.25  489.75
    29/06/2012  229 490.25
    28/06/2012  229.75  487.25
    27/06/2012  229.75  493
    26/06/2012  226.5   486
    25/06/2012  220 482.25
    22/06/2012  214.25  472.5
    21/06/2012  212 469.5
    20/06/2012  210.25  473.75
    19/06/2012  208 472.75
    18/06/2012  206.75  462.5
    15/06/2012  203 456.5
    14/06/2012  205.25  460.5
    13/06/2012  205.25  465.25
    12/06/2012  205.25  469
    11/06/2012  208 471.5
    08/06/2012  208 468.5
    07/06/2012  208 471.25
    06/06/2012  208 467
    05/06/2012  208 458.75
    04/06/2012  208 457.5
    01/06/2012  208 463.5
    31/05/2012  208 466.75
    30/05/2012  208 468
    29/05/2012  212.75  469.75
    28/05/2012  212.75  469.75
    25/05/2012  212.75  465.5



# Loads external libraries
library("zoo") # or require("zoo")
library("xts") # or require("xts")

# Loads data as a zoo object
source <- read.zoo("blé colza.txt", sep=",", dec=".", header=T, na.strings="NA",     format="%d/%m/%Y")

# Averages daily time series into weekly time series
# https://stackoverflow.com/questions/11129562/how-does-one-compute-the-mean-of-weekly-    data-by-column-using-r
source.w <- apply.weekly(source, colMeans) 

Upvotes: 0

Views: 4641

Answers (5)

tags
tags

Reputation: 4060

Looking again at my problem at hand.

It's straighforward using the xts library.

# say you have xts object name 'dat'
ep <- endpoints(dat, on = 'weeks')                          # 
period.apply(x = dat, INDEX = ep, FUN = mean)

Upvotes: 1

PatrickT
PatrickT

Reputation: 10540

follow-up on Joshua Ulrich's answer.

On my system (kUbuntu 12), the following did not retrieve a zone.tab file

tzfile <- file.path(R.home("share"), "zoneinfo", "zone.tab")

However, I was able to find zone.tab by

locate zone.tab

For some reason (file permissions perhaps), I was not able to point to that zone.tab file directly, i.e. writing:

tzfile <- "usr/share/zoneinfo/zone.tab"

returned:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'usr/share/zoneinfo/zone.tab': No such file or directory 

Problem solved after making a local copy of zone.tab and pointing to that copy:

tzfile <- "~/R/zone.tab"

Now if you Google for zone.tab, you'll find a copy of zone.tab online, in case your system doesn't have one or it's corrupted or whatever. Here is such a place:

http://www.ietf.org/timezones/data/zone.tab

P.S. I'm <15 so I can't post a comment, which is what I would otherwise have done.

Upvotes: 1

Joshua Ulrich
Joshua Ulrich

Reputation: 176728

mrdwab's answer only happens to work because they share a timezone (or a characteristic thereof) with the OP. To illustrate:

Lines <- 
    "24/07/2012  250.5   499
    23/07/2012  264.75  518.25
    20/07/2012  269.25  525.25
    19/07/2012  267 522.5
    18/07/2012  261.25  517
    17/07/2012  265.75  522.25
    16/07/2012  264.25  523.25
    13/07/2012  258.25  517
    12/07/2012  253.75  513
    11/07/2012  246.25  512.75
    10/07/2012  248 515
    09/07/2012  247 519.25
    06/07/2012  243.25  508.25
    05/07/2012  245 508.5
    04/07/2012  236 500.5
    03/07/2012  234 497.75
    02/07/2012  234.25  489.75
    29/06/2012  229 490.25
    28/06/2012  229.75  487.25
    27/06/2012  229.75  493
    26/06/2012  226.5   486
    25/06/2012  220 482.25
    22/06/2012  214.25  472.5
    21/06/2012  212 469.5
    20/06/2012  210.25  473.75
    19/06/2012  208 472.75
    18/06/2012  206.75  462.5
    15/06/2012  203 456.5
    14/06/2012  205.25  460.5
    13/06/2012  205.25  465.25
    12/06/2012  205.25  469
    11/06/2012  208 471.5
    08/06/2012  208 468.5
    07/06/2012  208 471.25
    06/06/2012  208 467
    05/06/2012  208 458.75
    04/06/2012  208 457.5
    01/06/2012  208 463.5
    31/05/2012  208 466.75
    30/05/2012  208 468
    29/05/2012  212.75  469.75
    28/05/2012  212.75  469.75
    25/05/2012  212.75  465.5"

# Get R's timezone information (from ?Sys.timezone)
tzfile <- file.path(R.home("share"), "zoneinfo", "zone.tab")
tzones <- read.delim(tzfile, row.names = NULL, header = FALSE,
  col.names = c("country", "coords", "name", "comments"),
  as.is = TRUE, fill = TRUE, comment.char = "#")

# Run the analysis on each timezone
out <- list()
library(xts)
for(i in seq_along(tzones$name)) {
  tzn <- tzones$name[i]
  Sys.setenv(TZ=tzn)
  con <- textConnection(Lines)
  Source <- read.zoo(con, format="%d/%m/%Y")
  out[[tzn]] <- apply.weekly(Source, colMeans)
}

Now you can run head(out,5) and see that some of the output differ based on the timezone used:

head(out,5)
$`Europe/Andorra`
               V2      V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000

$`Asia/Dubai`
               V2      V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000

$`Asia/Kabul`
               V2      V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000

$`America/Antigua`
                V2      V3
2012-05-25 212.750 465.500
2012-06-01 209.900 467.550
2012-06-08 208.000 464.600
2012-06-15 205.350 464.550
2012-06-22 210.250 470.200
2012-06-29 227.000 487.750
2012-07-06 238.500 500.950
2012-07-13 250.650 515.400
2012-07-20 265.500 522.050
2012-07-24 257.625 508.625

$`America/Anguilla`
                V2      V3
2012-05-25 212.750 465.500
2012-06-01 209.900 467.550
2012-06-08 208.000 464.600
2012-06-15 205.350 464.550
2012-06-22 210.250 470.200
2012-06-29 227.000 487.750
2012-07-06 238.500 500.950
2012-07-13 250.650 515.400
2012-07-20 265.500 522.050
2012-07-24 257.625 508.625

The more robust solution is to ensure that your timezone is correctly represented, either by using Sys.setenv(TZ="<yourTZ>") to set it globally or indexTZ(Source) <- "<yourTZ>" to set it for each individual object.

Upvotes: 7

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

I was able to reproduce your problem, and you can solve it using period.apply() and custom "endpoints".

First, the data you provided, in a format that others can read in easily.

temp = structure(list(V1 = structure(c(33L, 32L, 29L, 27L, 25L, 23L, 
22L, 19L, 17L, 15L, 13L, 12L, 9L, 7L, 5L, 3L, 2L, 41L, 39L, 37L, 
36L, 35L, 31L, 30L, 28L, 26L, 24L, 21L, 20L, 18L, 16L, 14L, 11L, 
10L, 8L, 6L, 4L, 1L, 43L, 42L, 40L, 38L, 34L), .Label = c("01/06/2012", 
"02/07/2012", "03/07/2012", "04/06/2012", "04/07/2012", "05/06/2012", 
"05/07/2012", "06/06/2012", "06/07/2012", "07/06/2012", "08/06/2012", 
"09/07/2012", "10/07/2012", "11/06/2012", "11/07/2012", "12/06/2012", 
"12/07/2012", "13/06/2012", "13/07/2012", "14/06/2012", "15/06/2012", 
"16/07/2012", "17/07/2012", "18/06/2012", "18/07/2012", "19/06/2012", 
"19/07/2012", "20/06/2012", "20/07/2012", "21/06/2012", "22/06/2012", 
"23/07/2012", "24/07/2012", "25/05/2012", "25/06/2012", "26/06/2012", 
"27/06/2012", "28/05/2012", "28/06/2012", "29/05/2012", "29/06/2012", 
"30/05/2012", "31/05/2012"), class = "factor"), V2 = c(250.5, 
264.75, 269.25, 267, 261.25, 265.75, 264.25, 258.25, 253.75, 
246.25, 248, 247, 243.25, 245, 236, 234, 234.25, 229, 229.75, 
229.75, 226.5, 220, 214.25, 212, 210.25, 208, 206.75, 203, 205.25, 
205.25, 205.25, 208, 208, 208, 208, 208, 208, 208, 208, 208, 
212.75, 212.75, 212.75), V3 = c(499, 518.25, 525.25, 522.5, 517, 
522.25, 523.25, 517, 513, 512.75, 515, 519.25, 508.25, 508.5, 
500.5, 497.75, 489.75, 490.25, 487.25, 493, 486, 482.25, 472.5, 
469.5, 473.75, 472.75, 462.5, 456.5, 460.5, 465.25, 469, 471.5, 
468.5, 471.25, 467, 458.75, 457.5, 463.5, 466.75, 468, 469.75, 
469.75, 465.5)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
-43L))

We'll do come clean-up and convert the object to an xts object.

temp$V1 = as.Date(temp$V1, format="%d/%m/%Y")
library(xts)
temp.x = xts(temp[-1], order.by=temp$V1)

Now. we try the apply.weekly() function, but it doesn't give us what you want.

apply.weekly(temp.x, colMeans)
#                V2      V3
# 2012-05-28 212.75 467.625
# 2012-06-04 208.95 465.100
# 2012-06-11 208.00 467.400
# 2012-06-18 205.10 462.750
# 2012-06-25 212.90 474.150
# 2012-07-02 229.85 489.250
# 2012-07-09 241.05 506.850
# 2012-07-16 254.10 516.200
# 2012-07-23 265.60 521.050
# 2012-07-24 250.50 499.000

To use period.apply() you need to specify the endpoints of your periods (which can be irregular). Here, our first period is just the first date, and from there, it's every five days. That leaves a few days left over, so we add nrow(temp.x) as the end of our final period.

ep = c(0, seq(1, nrow(temp.x), by = 5), nrow(temp.x))
period.apply(temp.x, INDEX = ep, FUN = colMeans)
#                 V2      V3
# 2012-05-25 212.750 465.500
# 2012-06-01 209.900 467.550
# 2012-06-08 208.000 464.600
# 2012-06-15 205.350 464.550
# 2012-06-22 210.250 470.200
# 2012-06-29 227.000 487.750
# 2012-07-06 238.500 500.950
# 2012-07-13 250.650 515.400
# 2012-07-20 265.500 522.050
# 2012-07-24 257.625 508.625

Upvotes: 3

Jase_
Jase_

Reputation: 1196

I ran your example and If I understand the problem correctly, the apply.weekly function is aggregating the first friday with the first monday of your data. I don't use the xts package, so someone else would have to provide more insight on that. I would convert the dates to a vector of dates with the date of the Monday of each week representing every observation of that week. ?strptime summaries the codes I use for the conversions.

# Get the year of the first observation
start_year <- format(time(source)[1],"%Y")
# Convert this into a date for the 1st of Jan in that year.
start_date <- as.Date(strptime(paste(start_year, "1 1"), "%Y %d %m"))

# Using the difftime function determine the distance (days) since the first day of the first year.
jul_day <- as.numeric(difftime(time(source),start_date),units="days")
# Get the date of the Monday before each observation and add it to the start of the year. 
mondays <- start_date + (jul_day - (jul_day-1)%%7)
# the %% calculates the remainder.
# to check that it has worked convert the mondays vector into day names.
format(mondays, "%A")

# And now you can aggregate the observations using the mondays vector.
source.w <- aggregate(source[,1:2], mondays, "mean")

Upvotes: 2

Related Questions