Reputation: 2247

How to adjust daily stock prices dataframe for weekend to create a timeseries which can be decomposed in r?

I have a daily frequency dataframe which I am trying to convert as timeseries data & then do decompose() on it. Daily freq stock data doesn't have weekend data in it so I am not sure how to deal with general formula ts(frequency = 365)

Code that I have attempted:

data

library(tidyverse)
library(lubridate)
library(quantmod)
library(zoo)
library(xts)

adani_green_df <- read.csv("https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/adani_daily_data.csv")

Getting day of the year for start() formula input

adani_green_df %>% 
  head(n = 1) %>% 
  mutate(day_of_year = lubridate::yday(date)) %>% 
  select(date, day_of_year)

######### output ###########
date       day_of_year
  <date>           <dbl>
1 2018-06-18         169

I am not sure if the formula that I have used in below code: ts(frequency = 365, start = c(2018,169)) is correct or not ?

adani_green_df %>% 
  select(date,CLOSE) %>% 
  `colnames<-`(c("date","close")) %>% 
  column_to_rownames("date") %>% 
  as.xts() %>% 
  # print(.,calendar = TRUE) %>% 
  ts(frequency = 365, start = c(2018,169)) %>% 
  decompose() %>% 
  plot()

Reason of Doubt: The above plot doesn't show 2022 data where as the max date in data is 2022-09-19 as checked in below code:

# getting data range of Stock prices
adani_green_df %>% 
  select(date) %>% 
  summary()

######### output ##########
      date           
 Min.   :2018-06-18  
 1st Qu.:2019-07-14  
 Median :2020-08-06  
 Mean   :2020-08-04  
 3rd Qu.:2021-08-26  
 Max.   :2022-09-19

Update 1: Attempting code based on Answer 1

library(timetk)
plot_stl_diagnostics(adani_green_df, .date_var = date, .value = CLOSE)

plot_stl_diagnostics(adani_green_df, .date_var = date, .value = CLOSE,.feature_set = c("season"))

Upvotes: 0

Answers (3)

G. Grothendieck

Reputation: 269885

Assuming that you want a full cycle to be a year, the key issue, as in your previous question, is how to align the data so that there are the same number of points in each year. Calculate a table of the number of points in each year, tab. Then find min_days which is the minimum number of days per year excluding potentially short years which are the first and last years and then remove any points having a day number greater than min_days giving zz. Using that we can generate a regular ts series, tser, which is aligned with year and can be used with decompose.

library(zoo)

z <- read.csv.zoo(u) # u is URL given in Note at end

yr <- as.numeric(format(time(z), "%Y"))
tab <- table(yr)
min_days <- min(tab[seq(2, length(tab)-1)])
zz <- z[ave(yr, yr, FUN = seq_along) <= min_days]
tser <- ts(zz, start = c(yr[1], min_days - tab[[1]] + 1), freq = min_days)

Note

u <- "https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/adani_daily_data.csv"

Upvotes: 0

ViSa

Reputation: 2247

I have also made an attempt to solve this by filling the missing weekend dates & then carry forward the non na Close price to overwrite the na's & then decompose & plot it.

library(padr)

adani_green_df_filled <- adani_green_df %>% 
  select(date,CLOSE) %>% 
  padr::pad(.) %>%                  # fill missing dates
  mutate(CLOSE = na.locf(CLOSE))    # carry forward non na values to replace na values

adani_green_df_filled

adani_green_df_filled %>% 
  `colnames<-`(c("date","close")) %>% 
  column_to_rownames("date") %>% 
  as.xts() %>% 
  ts(frequency = 365, start = c(2018,169)) %>% 
  decompose() %>% 
  plot()

Upvotes: 0

phiver

Reputation: 23608

The problem you face here is that the ts function creates an internal counter and creates a timeseries starting at 2018, day 169 and then starts counting. Each observation an extra day, not skipping any non-business days. One option is adjusting the frequency to +/- 220 as these are roughly the number of trading days in a year.

timetk, modeltime

Personally, I use the timetk, modeltime and tidymodels for dealing with stock prices. timetk can handle decompositions of data without weekends and shows the plots in plotly. modeltime in combination with tidymodels can handle any forecasting you need.

Example code, plot not included.

library(timetk)
plot_stl_diagnostics(adani_green_df, .date_var = Date, .value = close)
#frequency = 5 observations per 1 week
#trend = 66 observations per 3 months

fable, feasts, tsibble

You can also use fable, the successor from forecast.

What you can do is create an index (or use a variable as an index). In the example below I create a variable idx to use as the index. Keep any groupings you need in mind when creating an index.

Forecasting is done based on this index. So forecast(model, h = 12) will forecast 12 index values into the future. You then need to translate that back into your date column. Example code below, plot not included.

library(fable)
library(feasts)
library(tsibble)
library(dplyr)

fc <- adani_green_df %>% 
  mutate(idx = row_number()) %>% 
  tsibble(index = idx)

dcmp <- fc %>%
  model(stl = STL(close))

components(dcmp) %>% autoplot()

Of course you could also use prophet. But the above two options include ways of using prophet in them without using prophet directly. This saves in renaming column names and other things that are being done to make everything work.

Upvotes: 1

How to adjust daily stock prices dataframe for weekend to create a timeseries which can be decomposed in r?

Answers (3)

Note

Related Questions