Reputation: 11399
This question was marked as duplicate. I don't think it is a duplicate because the specific issues of
Have not been dealt with elsewhere. I have worked on an answer which I am not allowed to paste in the original question. Therefore I paste it here.
Based on daily data for
15 years from 1993 to 2008
. How to compute the daily average, for the variableOpen
in the file, for each day of the year, based on a 31 day Window centred on the day of interest. Thus,15⨯31 = 465 dates
contribute to the statistics of one day.Output is just 365 values out of the 15 years
The file can be downloaded from here: http://chart.yahoo.com/table.csv?s=sbux&a=2&b=01&c=1993&d=2&e=01&f=2008&g=d&q=q&y=0&z=sbux&x=.csv
Upvotes: 2
Views: 234
Reputation: 11399
Load packages and data
library(lubridate)
library(dplyr)
dtf <- read.csv("http://chart.yahoo.com/table.csv?s=sbux&a=2&b=01&c=1993&d=2&e=01&f=2008&g=d&q=q&y=0&z=sbux&x=.csv", stringsAsFactors = FALSE)
# I prefer lower case column names
names(dtf) <- tolower(names(dtf))
The lubridate package has a nice function ddays() that adds a number of days. It deals with February 29. For example
ymd("2008-03-01") - ddays(15)
# [1] "2008-02-15 UTC"
ymd("2007-03-01") - ddays(15)
# [1] "2007-02-14 UTC"
Add minus15 and plus15 dates to the dataset, these will be the time bounds over which the average should be calculated for a given date in a given year.
dtf <- dtf %>%
mutate(date = ymd(date),
minus15 = date - ddays(15),
plus15 = date + ddays(15),
monthday = substr(as.character(date),6,10),
year = year(date),
plotdate = ymd(paste(2008,monthday,sep="-")))
calendardays <- dtf %>%
select(monthday) %>%
distinct() %>%
arrange(monthday)
Create a function that gives the average over all those 15 years for a given day :
meanday <- function(givenday, dtf){
# Extract the given day minus 15 days in all years available
# Day minus 15 days will differ for example for march first
# in years where there is a february 29
lowerbound <- dtf$minus15[dtf$monthday == givenday]
# Produce the series of 31 days around the given day
# that is the lower bound + 30 days
filterdates <- lapply(lowerbound, function(x) x + ddays(0:30))
filterdates <- Reduce(c, filterdates)
# filter all of these days
dtfgivenday <- dtf %>%
filter(date %in% filterdates)
return(mean(dtfgivenday$open))
}
Use that function over all dates available in the calendar:
meandays <- sapply(calendardays$monthday, meanday, dtf)
calendardays <- calendardays %>%
mutate(mean = meandays,
plotdate = ymd(paste(2008,monthday,sep="-")))
Plots
plot(dtf$date,dtf$open,type="l")
library(ggplot2)
ggplot(dtf, aes(x=date,y=open, color = as.factor(year))) + geom_line()
ggplot(dtf, aes(x=plotdate,y=open, color = as.factor(year))) + geom_line()
ggplot(calendardays, aes(x=plotdate, y=mean)) + geom_line()
Is it strange to see a periodicity appear here?
Upvotes: 2