Reputation: 47
I am relatively new to R, and am currently trying to implement time series on a data set to predict product volume for next six months. My data set has 2 columns Dates(-timestamp) and volume of product in inventory (on that particular day) for example like this :
Date Volume
24-06-2013 16986
25-06-2013 11438
26-06-2013 3378
27-06-2013 27392
28-06-2013 24666
01-07-2013 52368
02-07-2013 4468
03-07-2013 34744
04-07-2013 19806
05-07-2013 69230
08-07-2013 4618
09-07-2013 7140
10-07-2013 5792
11-07-2013 60130
12-07-2013 10444
15-07-2013 36198
16-07-2013 11268
I need to predict six months of product volume required in inventory after end date(in my data set which is "14-06-2019" "3131076").Approx 6 year of data I am having start date 24-06-2013 and end date 14-06-2019
I tried using auto.arima(R) on my data set and got many errors. I started researching on the ways to make my data suitable for ts analysis and came to know about imputets and zoo packages.
I guess date has high relevance for inputting frequency value in the model so I did this : I created a new column and calculated the frequency of each weekday which is not the same
data1 <- mutate(data, day = weekdays(as.Date(Date)))
> View(data1)
> table(data1$day)
Friday Monday Saturday Sunday Thursday Tuesday Wednesday
213 214 208 207 206 211 212
There are no missing values against dates but we can see from above that count of each week day is not the same, some of the dates are missing, how to proceed with that ? I have met kind of dead end , tried going through various posts here on impute ts and zoo package but didn't get much success.
Can someone please guide me how to proceed further and pardon me @admins and users if you think its spamming but it is really important for me at the moment. I tried to go through various tutorials on Time series out side but almost all of them have used air passengers data set which I think has no flaws.
Regards RD
library(imputeTS)
library(dplyr)
library(forecast)
setwd("C:/Users/sittu/Downloads")
data <- read.csv("ts.csv")
str(data)
$ Date : Factor w/ 1471 levels "01-01-2014","01-01-2015",..: 1132 1181 1221 1272 1324 22 71 115 163 213 ...
$ Volume: Factor w/ 1468 levels "0","1002551",..: 379 116 840 706 643 1095 1006 864 501 1254 ...
data$Volume <- as.numeric(data$Volume)
data$Date <- as.Date(data$Date, format = "%d/%m/%Y")
str(data)
'data.frame': 1471 obs. of 2 variables:
$ Date : Date, format: NA NA NA ... ## 1st Error now showing NA instead of dates
$ Volume: num 379 116 840 706 643 ...
Upvotes: 0
Views: 78
Reputation: 528
Let's try to generate that dataset :
First, let's reproduce a dataset with missing data :
dates <- seq(as.Date("2018-01-01"),as.Date("2018-12-31"),1)
volume <- floor(runif(365, min=2500, max=50000))
dummy_df <- do.call(rbind, Map(data.frame, date=dates, Volume=volume))
df <- dummy_df %>% sample_frac(0.8)
Here we generated a dataframe with Date
and volume
for the year 2018, with 20%missing data (sample_frac(0.8)
).
This should mimic correctly your dataset with missing data for some days.
What we want from there is to find the days with no volume data :
Df_full_dates <- as.data.frame(dates) %>%
left_join(df,by=c('dates'='date'))
Now you want to replace the NA
values (that correspond to days with no data) with a volume (I took 0 there but if its missing data, you might want to put the month avg or a specific value, I do not know what suits best your data from your sample) :
Df_full_dates[is.na(Df_full_dates)] <- 0
From there, you have a dataset with data for each day, you should be able to find a model to predict the volume in future months.
Tell me if you have any question
Upvotes: 1