Paul Rougieux
Paul Rougieux

Reputation: 11399

How to compute daily average over 31 days for 15 years, taking into account missing values?

This question was marked as duplicate. I don't think it is a duplicate because the specific issues of

Have not been dealt with elsewhere. I have worked on an answer which I am not allowed to paste in the original question. Therefore I paste it here.

Based on daily data for 15 years from 1993 to 2008. How to compute the daily average, for the variable Open in the file, for each day of the year, based on a 31 day Window centred on the day of interest. Thus, 15⨯31 = 465 dates contribute to the statistics of one day.

Output is just 365 values out of the 15 years

The file can be downloaded from here: http://chart.yahoo.com/table.csv?s=sbux&a=2&b=01&c=1993&d=2&e=01&f=2008&g=d&q=q&y=0&z=sbux&x=.csv

Upvotes: 2

Views: 234

Answers (1)

Paul Rougieux
Paul Rougieux

Reputation: 11399

Load packages and data

library(lubridate)
library(dplyr)
dtf <- read.csv("http://chart.yahoo.com/table.csv?s=sbux&a=2&b=01&c=1993&d=2&e=01&f=2008&g=d&q=q&y=0&z=sbux&x=.csv", stringsAsFactors = FALSE)
# I prefer lower case column names
names(dtf) <- tolower(names(dtf))    

The lubridate package has a nice function ddays() that adds a number of days. It deals with February 29. For example

ymd("2008-03-01") - ddays(15)
# [1] "2008-02-15 UTC"
ymd("2007-03-01") - ddays(15)
# [1] "2007-02-14 UTC"

Add minus15 and plus15 dates to the dataset, these will be the time bounds over which the average should be calculated for a given date in a given year.

dtf <- dtf %>% 
    mutate(date = ymd(date),
           minus15 = date - ddays(15),
           plus15 = date + ddays(15),
           monthday = substr(as.character(date),6,10),
           year = year(date),
           plotdate = ymd(paste(2008,monthday,sep="-"))) 

calendardays <- dtf %>% 
    select(monthday) %>% 
    distinct() %>%
    arrange(monthday) 

Create a function that gives the average over all those 15 years for a given day :

meanday <- function(givenday, dtf){
    # Extract the given day minus 15 days in all years available
    # Day minus 15 days will differ for example for march first 
    # in years where there is a february 29
    lowerbound <- dtf$minus15[dtf$monthday == givenday]
    # Produce the series of 31 days around the given day
    # that is the lower bound + 30 days
    filterdates <- lapply(lowerbound, function(x) x + ddays(0:30))
    filterdates <- Reduce(c, filterdates)
    # filter all of these days 
    dtfgivenday <- dtf %>%
        filter(date %in% filterdates) 
    return(mean(dtfgivenday$open))
}

Use that function over all dates available in the calendar:

meandays <- sapply(calendardays$monthday, meanday, dtf)
calendardays <- calendardays %>% 
    mutate(mean = meandays,
           plotdate = ymd(paste(2008,monthday,sep="-")))

Plots

plot(dtf$date,dtf$open,type="l")
library(ggplot2)
ggplot(dtf, aes(x=date,y=open, color = as.factor(year))) + geom_line()
ggplot(dtf, aes(x=plotdate,y=open, color = as.factor(year))) + geom_line()
ggplot(calendardays, aes(x=plotdate, y=mean)) + geom_line()

Time series

All series on one year

Plot of the moving average

Is it strange to see a periodicity appear here?

Upvotes: 2

Related Questions