Reputation: 915
THis is the first time i am working on time series, hence kindly pardon me for anything wrong in my approach.
I have monthly sales data for multiple Groups. THe data is for 3 years, and i would like to implement time series analysis for the same. I am not sure if 3 years data is actually good enough, but however i would like to understand it better.
I currently understand that the time series is decomposed into three parts- Trend, Seasonality and Random.
I want to split my Sales data for each Group, into the Trend, Seasonality and Random part. Since trend and seasonality are gone, hence i want to use only random to understand the Sales metrics better.
Since data is monthly, hence i need to use multiplicative. Should i use STL or decompose?
I have the basic Decompose code, however not sure how to incorporate the same for multiple groups, to identify the trend, seasonality and random for each group.
I am not referring to ARIMA model. I am basically referring to the standard time series approach.
Below is how my data looks like.
Group Date Month Sales
Group1 Jan-15 1 75030
Group1 Feb-15 2 16073
Group1 Mar-15 3 17161
Group1 Apr-15 4 94946
Group1 May-15 5 62999
Group1 Jun-15 6 4698
Group1 Jul-15 7 76743
Group1 Aug-15 8 28800
Group1 Sep-15 9 12225
Group1 Oct-15 10 71793
Group1 Nov-15 11 26686
Group1 Dec-15 12 6252
Group1 Jan-16 13 82698
Group1 Feb-16 14 71201
Group1 Mar-16 15 65798
Group1 Apr-16 16 4407
Group1 May-16 17 7491
Group1 Jun-16 18 24366
Group1 Jul-16 19 99616
Group1 Aug-16 20 74443
Group1 Sep-16 21 54122
Group1 Oct-16 22 20762
Group1 Nov-16 23 91376
Group1 Dec-16 24 18693
Group1 Jan-17 25 30395
Group1 Feb-17 26 82049
Group1 Mar-17 27 79701
Group1 Apr-17 28 38862
Group1 May-17 29 84802
Group1 Jun-17 30 81715
Group1 Jul-17 31 60786
Group1 Aug-17 32 88731
Group1 Sep-17 33 28502
Group1 Oct-17 34 79245
Group1 Nov-17 35 15553
Group1 Dec-17 36 3237
Group2 Jan-15 1 8990
Group2 Feb-15 2 47516
Group2 Mar-15 3 15076
Group2 Apr-15 4 60888
Group2 May-15 5 47111
Group2 Jun-15 6 7770
Group2 Jul-15 7 25080
Group2 Aug-15 8 46586
Group2 Sep-15 9 12595
Group2 Oct-15 10 71883
Group2 Nov-15 11 21634
Group2 Dec-15 12 78799
Group2 Jan-16 13 57596
Group2 Feb-16 14 35685
Group2 Mar-16 15 68518
Group2 Apr-16 16 35661
Group2 May-16 17 65294
Group2 Jun-16 18 62602
Group2 Jul-16 19 13506
Group2 Aug-16 20 49215
Group2 Sep-16 21 32008
Group2 Oct-16 22 27924
Group2 Nov-16 23 56146
Group2 Dec-16 24 23975
Group2 Jan-17 25 18686
Group2 Feb-17 26 77076
Group2 Mar-17 27 63992
Group2 Apr-17 28 38087
Group2 May-17 29 19846
Group2 Jun-17 30 46823
Group2 Jul-17 31 11035
Group2 Aug-17 32 73686
Group2 Sep-17 33 35523
Group2 Oct-17 34 97417
Group2 Nov-17 35 27954
Group2 Dec-17 36 79004
Below is my code.
x <- ts(df, start = c(2015, 1), end = c(2017, 12), frequency = 12)
m <- decompose(x)
Please correct me if there is something wrong in my approach, since I am new to time series modelling.
Thanks,
Jay
Upvotes: 1
Views: 1476
Reputation: 3414
The first column is a factor
hence you can use tapply
function to extract time series by Group. The results will be stored in list
. Than you can use lapply
with agruments: list
of time series and function decompose
.
To access the results of decomposition you can index the list
, e.g. dcs[[1]]
will extract decompostion for Group 1.
Data:
df <- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("Group1", "Group2"), class = "factor"), Date = structure(c(13L,
10L, 22L, 1L, 25L, 19L, 16L, 4L, 34L, 31L, 28L, 7L, 14L, 11L,
23L, 2L, 26L, 20L, 17L, 5L, 35L, 32L, 29L, 8L, 15L, 12L, 24L,
3L, 27L, 21L, 18L, 6L, 36L, 33L, 30L, 9L, 13L, 10L, 22L, 1L,
25L, 19L, 16L, 4L, 34L, 31L, 28L, 7L, 14L, 11L, 23L, 2L, 26L,
20L, 17L, 5L, 35L, 32L, 29L, 8L, 15L, 12L, 24L, 3L, 27L, 21L,
18L, 6L, 36L, 33L, 30L, 9L), .Label = c("Apr-15", "Apr-16", "Apr-17",
"Aug-15", "Aug-16", "Aug-17", "Dec-15", "Dec-16", "Dec-17", "Feb-15",
"Feb-16", "Feb-17", "Jan-15", "Jan-16", "Jan-17", "Jul-15", "Jul-16",
"Jul-17", "Jun-15", "Jun-16", "Jun-17", "Mar-15", "Mar-16", "Mar-17",
"May-15", "May-16", "May-17", "Nov-15", "Nov-16", "Nov-17", "Oct-15",
"Oct-16", "Oct-17", "Sep-15", "Sep-16", "Sep-17"), class = "factor"),
Month = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L,
27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L), Sales = c(75030L,
16073L, 17161L, 94946L, 62999L, 4698L, 76743L, 28800L, 12225L,
71793L, 26686L, 6252L, 82698L, 71201L, 65798L, 4407L, 7491L,
24366L, 99616L, 74443L, 54122L, 20762L, 91376L, 18693L, 30395L,
82049L, 79701L, 38862L, 84802L, 81715L, 60786L, 88731L, 28502L,
79245L, 15553L, 3237L, 8990L, 47516L, 15076L, 60888L, 47111L,
7770L, 25080L, 46586L, 12595L, 71883L, 21634L, 78799L, 57596L,
35685L, 68518L, 35661L, 65294L, 62602L, 13506L, 49215L, 32008L,
27924L, 56146L, 23975L, 18686L, 77076L, 63992L, 38087L, 19846L,
46823L, 11035L, 73686L, 35523L, 97417L, 27954L, 79004L)), class = "data.frame", row.names = c(NA,
-72L))
Code:
tss <- tapply(df$Sales, df$Group, ts, start = c(2015, 1), frequency = 12)
dcs <- lapply(tss, decompose)
Upvotes: 0