DPatrick
DPatrick

Reputation: 431

ts object does not work for daily data in R - really confused

I have spent 1-day search for the answer to this question and yet still could not figure out how this works (relatively new to R).

The data: I have the daily revenue of a store. The starting date is November 2017, and the end date is February 2020. (It is not a typical Jan - Dec every year data). There is no missing value, and every day's sale is recorded. There are 2 columns: date (in proper date format) and revenue (in numerical format).

I am trying to build a time series forecasting model for my sales data. One pre-requisite is that I need to transform my data into the ts object. All those posts online I have seen dealt with yearly or monthly data, yet I have not yet seen anyone mention daily data.

I tried to convert my data to a ts object this way (I named my data "d"):

d_ts <- ts(d$revenue, start=min(d$date), end = max(d$date), frequency = 365)

I then got really weird results as such:

Start = c(17420, 1) 
End = c(18311, 1)
Frequency = 365 

[1]    1174.77     214.92      10.00     684.86    7020.04   11302.50   30613.55   29920.98   24546.49   22089.89   30291.65   32993.05   26517.11   39670.38   30361.32   17510.72
  [17]    9888.76    3032.27    1229.74    2426.36 ....... [ reached getOption("max.print") -- omitted 324216 entries ]

There are 892 days in this dataset, how come the ts object's dimension to be 325,216 x 1 ????

I looked into this book called "Hands-On Time-Series with R" and found the following excerpt:

enter image description here

This basically means the ts() object does NOT work for daily data. Is this why my ts() conversion is messed up?

My questions are ...

(1) How can I make my daily revenue data to be a time series object before feeding into a model, if ts() does not work for daily data? All those time-series models require my data to be in time-series format though.

(2) Does the fact that my data does not start on Jan 2017 & end on Dec 2019 (i.e. those perfect 12 months in a year data shown in many online posts) have any complications? If so, what should I adjust so that the time series forecasting would be meaningful?

I have been stuck on this issue and could not wrap my head around. I really, really appreciate your help!

Upvotes: 1

Views: 2498

Answers (2)

Rob Hyndman
Rob Hyndman

Reputation: 31820

With daily data, you are better off using a tsibble class rather than a ts class. There are modelling and forecasting tools available via the fable package.

library(tsibble)
library(fable)

set.seed(1)
d_tsibble <- data.frame(
    date = seq(as.Date("2017-11-01"), by = "day", length.out = 892),
    revenue = rnorm(892)
  ) %>%
  as_tsibble(index = date)

d_tsibble
#> # A tsibble: 892 x 2 [1D]
#>    date       revenue
#>    <date>       <dbl>
#>  1 2017-11-01  -0.626
#>  2 2017-11-02   0.184
#>  3 2017-11-03  -0.836
#>  4 2017-11-04   1.60 
#>  5 2017-11-05   0.330
#>  6 2017-11-06  -0.820
#>  7 2017-11-07   0.487
#>  8 2017-11-08   0.738
#>  9 2017-11-09   0.576
#> 10 2017-11-10  -0.305
#> # … with 882 more rows

d_tsibble %>%
  model(
    arima = ARIMA(revenue)
  ) %>%
  forecast(h = "14 days")
#> # A fable: 14 x 4 [1D]
#> # Key:     .model [1]
#>    .model date          revenue .distribution   
#>    <chr>  <date>          <dbl> <dist>          
#>  1 arima  2020-04-11 -0.0178    N(-1.8e-02, 1.1)
#>  2 arima  2020-04-12 -0.0117    N(-1.2e-02, 1.1)
#>  3 arima  2020-04-13 -0.00765   N(-7.7e-03, 1.1)
#>  4 arima  2020-04-14 -0.00501   N(-5.0e-03, 1.1)
#>  5 arima  2020-04-15 -0.00329   N(-3.3e-03, 1.1)
#>  6 arima  2020-04-16 -0.00215   N(-2.2e-03, 1.1)
#>  7 arima  2020-04-17 -0.00141   N(-1.4e-03, 1.1)
#>  8 arima  2020-04-18 -0.000925  N(-9.2e-04, 1.1)
#>  9 arima  2020-04-19 -0.000606  N(-6.1e-04, 1.1)
#> 10 arima  2020-04-20 -0.000397  N(-4.0e-04, 1.1)
#> 11 arima  2020-04-21 -0.000260  N(-2.6e-04, 1.1)
#> 12 arima  2020-04-22 -0.000171  N(-1.7e-04, 1.1)
#> 13 arima  2020-04-23 -0.000112  N(-1.1e-04, 1.1)
#> 14 arima  2020-04-24 -0.0000732 N(-7.3e-05, 1.1)

Created on 2020-04-01 by the reprex package (v0.3.0)

Upvotes: 0

Miff
Miff

Reputation: 7941

The ts function can work with any time interval, that's defined by the start and end points. As you're using dates, one unit corresponds to one day, as this is how they're stored internally. The help file at ?ts also shows examples of how to use annual or quarterly data,

To read in your daily data correctly you need to set frequency=1. Using some data similar in structure to what you've got:

#Compile a dataframe like yours
library(lubridate)
set.seed(0)
d <- data.frame(date=seq.Date(dmy("01/11/2017/"), by="day", length.out=892))
d$revenue <- runif(892)

head(d)
#date   revenue
# 1 2017-11-01 0.8966972
# 2 2017-11-02 0.2655087
# 3 2017-11-03 0.3721239
# 4 2017-11-04 0.5728534
# 5 2017-11-05 0.9082078
# 6 2017-11-06 0.2016819

#Convert to timeseries object
d_ts <- ts(d$revenue, start=min(d$date), end = max(d$date), frequency = 1)

d_ts
# Time Series:
#   Start = 17471 
# End = 18362 
# Frequency = 1 
# [1] 0.896697200 0.265508663 0.372123900 0.572853363 0.908207790 0.201681931 0.898389685 0.944675269 0.660797792
# [10] 0.629114044 0.061786270 0.205974575 0.176556753 0.687022847 0.384103718 0.769841420 0.497699242 0.717618508

Upvotes: 0

Related Questions