Reputation: 2247
I have a daily frequency dataframe which I am trying to convert as timeseries data & then do decompose()
on it. Daily freq stock data doesn't have weekend data in it so I am not sure how to deal with general formula ts(frequency = 365)
Code that I have attempted:
data
library(tidyverse)
library(lubridate)
library(quantmod)
library(zoo)
library(xts)
adani_green_df <- read.csv("https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/adani_daily_data.csv")
Getting day of the year for start()
formula input
adani_green_df %>%
head(n = 1) %>%
mutate(day_of_year = lubridate::yday(date)) %>%
select(date, day_of_year)
######### output ###########
date day_of_year
<date> <dbl>
1 2018-06-18 169
I am not sure if the formula that I have used in below code: ts(frequency = 365, start = c(2018,169))
is correct or not ?
adani_green_df %>%
select(date,CLOSE) %>%
`colnames<-`(c("date","close")) %>%
column_to_rownames("date") %>%
as.xts() %>%
# print(.,calendar = TRUE) %>%
ts(frequency = 365, start = c(2018,169)) %>%
decompose() %>%
plot()
Reason of Doubt: The above plot doesn't show 2022 data where as the max date in data is 2022-09-19
as checked in below code:
# getting data range of Stock prices
adani_green_df %>%
select(date) %>%
summary()
######### output ##########
date
Min. :2018-06-18
1st Qu.:2019-07-14
Median :2020-08-06
Mean :2020-08-04
3rd Qu.:2021-08-26
Max. :2022-09-19
Update 1: Attempting code based on Answer 1
library(timetk)
plot_stl_diagnostics(adani_green_df, .date_var = date, .value = CLOSE)
plot_stl_diagnostics(adani_green_df, .date_var = date, .value = CLOSE,.feature_set = c("season"))
Upvotes: 0
Views: 552
Reputation: 269885
Assuming that you want a full cycle to be a year, the key issue, as in your previous question, is how to align the data so that there are the same number of points in each year. Calculate a table of the number of points in each year, tab
. Then find min_days
which is the minimum number of days per year excluding potentially short years which are the first and last years and then remove any points having a day number greater than min_days
giving zz
. Using that we can generate a regular ts series, tser
, which is aligned with year and can be used with decompose.
library(zoo)
z <- read.csv.zoo(u) # u is URL given in Note at end
yr <- as.numeric(format(time(z), "%Y"))
tab <- table(yr)
min_days <- min(tab[seq(2, length(tab)-1)])
zz <- z[ave(yr, yr, FUN = seq_along) <= min_days]
tser <- ts(zz, start = c(yr[1], min_days - tab[[1]] + 1), freq = min_days)
u <- "https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/adani_daily_data.csv"
Upvotes: 0
Reputation: 2247
I have also made an attempt to solve this by filling the missing weekend dates & then carry forward the non na Close price to overwrite the na's & then decompose & plot it.
library(padr)
adani_green_df_filled <- adani_green_df %>%
select(date,CLOSE) %>%
padr::pad(.) %>% # fill missing dates
mutate(CLOSE = na.locf(CLOSE)) # carry forward non na values to replace na values
adani_green_df_filled
adani_green_df_filled %>%
`colnames<-`(c("date","close")) %>%
column_to_rownames("date") %>%
as.xts() %>%
ts(frequency = 365, start = c(2018,169)) %>%
decompose() %>%
plot()
Upvotes: 0
Reputation: 23608
The problem you face here is that the ts
function creates an internal counter and creates a timeseries starting at 2018, day 169 and then starts counting. Each observation an extra day, not skipping any non-business days. One option is adjusting the frequency to +/- 220 as these are roughly the number of trading days in a year.
timetk, modeltime
Personally, I use the timetk, modeltime and tidymodels for dealing with stock prices. timetk can handle decompositions of data without weekends and shows the plots in plotly. modeltime in combination with tidymodels can handle any forecasting you need.
Example code, plot not included.
library(timetk)
plot_stl_diagnostics(adani_green_df, .date_var = Date, .value = close)
#frequency = 5 observations per 1 week
#trend = 66 observations per 3 months
fable, feasts, tsibble
You can also use fable, the successor from forecast.
What you can do is create an index (or use a variable as an index). In the example below I create a variable idx to use as the index. Keep any groupings you need in mind when creating an index.
Forecasting is done based on this index. So forecast(model, h = 12) will forecast 12 index values into the future. You then need to translate that back into your date column. Example code below, plot not included.
library(fable)
library(feasts)
library(tsibble)
library(dplyr)
fc <- adani_green_df %>%
mutate(idx = row_number()) %>%
tsibble(index = idx)
dcmp <- fc %>%
model(stl = STL(close))
components(dcmp) %>% autoplot()
Of course you could also use prophet. But the above two options include ways of using prophet in them without using prophet directly. This saves in renaming column names and other things that are being done to make everything work.
Upvotes: 1