Reputation: 431
I have spent 1-day search for the answer to this question and yet still could not figure out how this works (relatively new to R).
The data: I have the daily revenue of a store. The starting date is November 2017, and the end date is February 2020. (It is not a typical Jan - Dec every year data). There is no missing value, and every day's sale is recorded. There are 2 columns: date (in proper date format) and revenue (in numerical format).
I am trying to build a time series forecasting model for my sales data. One pre-requisite is that I need to transform my data into the ts object. All those posts online I have seen dealt with yearly or monthly data, yet I have not yet seen anyone mention daily data.
I tried to convert my data to a ts object this way (I named my data "d"):
d_ts <- ts(d$revenue, start=min(d$date), end = max(d$date), frequency = 365)
I then got really weird results as such:
Start = c(17420, 1)
End = c(18311, 1)
Frequency = 365
[1] 1174.77 214.92 10.00 684.86 7020.04 11302.50 30613.55 29920.98 24546.49 22089.89 30291.65 32993.05 26517.11 39670.38 30361.32 17510.72
[17] 9888.76 3032.27 1229.74 2426.36 ....... [ reached getOption("max.print") -- omitted 324216 entries ]
There are 892 days in this dataset, how come the ts object's dimension to be 325,216 x 1 ????
I looked into this book called "Hands-On Time-Series with R" and found the following excerpt:
This basically means the ts() object does NOT work for daily data. Is this why my ts() conversion is messed up?
My questions are ...
(1) How can I make my daily revenue data to be a time series object before feeding into a model, if ts() does not work for daily data? All those time-series models require my data to be in time-series format though.
(2) Does the fact that my data does not start on Jan 2017 & end on Dec 2019 (i.e. those perfect 12 months in a year data shown in many online posts) have any complications? If so, what should I adjust so that the time series forecasting would be meaningful?
I have been stuck on this issue and could not wrap my head around. I really, really appreciate your help!
Upvotes: 1
Views: 2498
Reputation: 31820
With daily data, you are better off using a tsibble
class rather than a ts
class. There are modelling and forecasting tools available via the fable
package.
library(tsibble)
library(fable)
set.seed(1)
d_tsibble <- data.frame(
date = seq(as.Date("2017-11-01"), by = "day", length.out = 892),
revenue = rnorm(892)
) %>%
as_tsibble(index = date)
d_tsibble
#> # A tsibble: 892 x 2 [1D]
#> date revenue
#> <date> <dbl>
#> 1 2017-11-01 -0.626
#> 2 2017-11-02 0.184
#> 3 2017-11-03 -0.836
#> 4 2017-11-04 1.60
#> 5 2017-11-05 0.330
#> 6 2017-11-06 -0.820
#> 7 2017-11-07 0.487
#> 8 2017-11-08 0.738
#> 9 2017-11-09 0.576
#> 10 2017-11-10 -0.305
#> # … with 882 more rows
d_tsibble %>%
model(
arima = ARIMA(revenue)
) %>%
forecast(h = "14 days")
#> # A fable: 14 x 4 [1D]
#> # Key: .model [1]
#> .model date revenue .distribution
#> <chr> <date> <dbl> <dist>
#> 1 arima 2020-04-11 -0.0178 N(-1.8e-02, 1.1)
#> 2 arima 2020-04-12 -0.0117 N(-1.2e-02, 1.1)
#> 3 arima 2020-04-13 -0.00765 N(-7.7e-03, 1.1)
#> 4 arima 2020-04-14 -0.00501 N(-5.0e-03, 1.1)
#> 5 arima 2020-04-15 -0.00329 N(-3.3e-03, 1.1)
#> 6 arima 2020-04-16 -0.00215 N(-2.2e-03, 1.1)
#> 7 arima 2020-04-17 -0.00141 N(-1.4e-03, 1.1)
#> 8 arima 2020-04-18 -0.000925 N(-9.2e-04, 1.1)
#> 9 arima 2020-04-19 -0.000606 N(-6.1e-04, 1.1)
#> 10 arima 2020-04-20 -0.000397 N(-4.0e-04, 1.1)
#> 11 arima 2020-04-21 -0.000260 N(-2.6e-04, 1.1)
#> 12 arima 2020-04-22 -0.000171 N(-1.7e-04, 1.1)
#> 13 arima 2020-04-23 -0.000112 N(-1.1e-04, 1.1)
#> 14 arima 2020-04-24 -0.0000732 N(-7.3e-05, 1.1)
Created on 2020-04-01 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 7941
The ts
function can work with any time interval, that's defined by the start and end points. As you're using dates, one unit corresponds to one day, as this is how they're stored internally. The help file at ?ts
also shows examples of how to use annual or quarterly data,
To read in your daily data correctly you need to set frequency=1
. Using some data similar in structure to what you've got:
#Compile a dataframe like yours
library(lubridate)
set.seed(0)
d <- data.frame(date=seq.Date(dmy("01/11/2017/"), by="day", length.out=892))
d$revenue <- runif(892)
head(d)
#date revenue
# 1 2017-11-01 0.8966972
# 2 2017-11-02 0.2655087
# 3 2017-11-03 0.3721239
# 4 2017-11-04 0.5728534
# 5 2017-11-05 0.9082078
# 6 2017-11-06 0.2016819
#Convert to timeseries object
d_ts <- ts(d$revenue, start=min(d$date), end = max(d$date), frequency = 1)
d_ts
# Time Series:
# Start = 17471
# End = 18362
# Frequency = 1
# [1] 0.896697200 0.265508663 0.372123900 0.572853363 0.908207790 0.201681931 0.898389685 0.944675269 0.660797792
# [10] 0.629114044 0.061786270 0.205974575 0.176556753 0.687022847 0.384103718 0.769841420 0.497699242 0.717618508
Upvotes: 0