rajeswa
rajeswa

Reputation: 47

Time Series application - Guidance Needed

I am relatively new to R, and am currently trying to implement time series on a data set to predict product volume for next six months. My data set has 2 columns Dates(-timestamp) and volume of product in inventory (on that particular day) for example like this :

Date    Volume
24-06-2013  16986
25-06-2013  11438
26-06-2013  3378
27-06-2013  27392
28-06-2013  24666
01-07-2013  52368
02-07-2013  4468
03-07-2013  34744
04-07-2013  19806
05-07-2013  69230
08-07-2013  4618
09-07-2013  7140
10-07-2013  5792
11-07-2013  60130
12-07-2013  10444
15-07-2013  36198
16-07-2013  11268

I need to predict six months of product volume required in inventory after end date(in my data set which is "14-06-2019" "3131076").Approx 6 year of data I am having start date 24-06-2013 and end date 14-06-2019

I tried using auto.arima(R) on my data set and got many errors. I started researching on the ways to make my data suitable for ts analysis and came to know about imputets and zoo packages.

I guess date has high relevance for inputting frequency value in the model so I did this : I created a new column and calculated the frequency of each weekday which is not the same

data1 <- mutate(data, day = weekdays(as.Date(Date)))
> View(data1)
> table(data1$day)
Friday    Monday  Saturday    Sunday  Thursday   Tuesday Wednesday 
      213       214       208       207       206       211       212 

There are no missing values against dates but we can see from above that count of each week day is not the same, some of the dates are missing, how to proceed with that ? I have met kind of dead end , tried going through various posts here on impute ts and zoo package but didn't get much success.

Can someone please guide me how to proceed further and pardon me @admins and users if you think its spamming but it is really important for me at the moment. I tried to go through various tutorials on Time series out side but almost all of them have used air passengers data set which I think has no flaws.

Regards RD

library(imputeTS)
library(dplyr)
library(forecast)

setwd("C:/Users/sittu/Downloads")

data <- read.csv("ts.csv")

str(data)
 $ Date  : Factor w/ 1471 levels "01-01-2014","01-01-2015",..: 1132 1181 1221 1272 1324 22 71 115 163 213 ...
 $ Volume: Factor w/ 1468 levels "0","1002551",..: 379 116 840 706 643 1095 1006 864 501 1254 ...

data$Volume <- as.numeric(data$Volume)
data$Date <- as.Date(data$Date, format = "%d/%m/%Y")

str(data)
'data.frame':   1471 obs. of  2 variables:
 $ Date  : Date, format: NA NA NA ...     ## 1st Error now showing NA instead of dates
 $ Volume: num  379 116 840 706 643 ...

Upvotes: 0

Views: 78

Answers (1)

cicero
cicero

Reputation: 528

Let's try to generate that dataset :

First, let's reproduce a dataset with missing data :

dates <- seq(as.Date("2018-01-01"),as.Date("2018-12-31"),1)
volume <- floor(runif(365, min=2500, max=50000))


dummy_df <- do.call(rbind, Map(data.frame, date=dates, Volume=volume))

df <- dummy_df %>% sample_frac(0.8)

Here we generated a dataframe with Date and volume for the year 2018, with 20%missing data (sample_frac(0.8)).

This should mimic correctly your dataset with missing data for some days.

What we want from there is to find the days with no volume data :

Df_full_dates <- as.data.frame(dates) %>% 
                 left_join(df,by=c('dates'='date'))

Now you want to replace the NA values (that correspond to days with no data) with a volume (I took 0 there but if its missing data, you might want to put the month avg or a specific value, I do not know what suits best your data from your sample) :

Df_full_dates[is.na(Df_full_dates)] <- 0

From there, you have a dataset with data for each day, you should be able to find a model to predict the volume in future months.

Tell me if you have any question

Upvotes: 1

Related Questions