Reputation: 7
I have a R dataframe which describes the evolution of the sales of a product in approx. 2000 shops in a quarterly basis, with 5 columns (ie. 5 periods of time). I'd like to know how to analyse it with R.
I've already tried to make some basic analysis, that is to say to determine the average sales for the 1st period, the 2nd period, etc. and then determine the average for each period and then to compare the evolution of each shop relatively to this general evolution. For instance, there is a total of 50 000 sales for the 1st period and 35 000 for the 5th, so I assume that for each shop the normal sale in the 5th period is to be 35/55=0.63*the amount of the 1st period's sale: if the shop X has sold 100 items in the first period, I assume that it should normally sell 63 items in the 5th period.
Obviously, this is an easy-to-do method, but it is not statistically relevant.
I would like a method which would enable me to determine a trend curb which miminizes my R-square. My objective is to be able to analyse the sales of the shops by neutralizing the general trend: I'd like to know precisely what are the underperforming shops and what are the overperforming shops, with a statistically correct approach.
My dataframe is structured in this way :
shopID | sum | qt1 | qt2 | qt3 | qt4 | qt5
000001 | 150 | 45 | 15 | 40 | 25 | 25
000002 | 100 | 20 | 20 | 20 | 20 | 20
000003 | 500 | 200 | 0 | 100 | 100 | 100
... (2200 rows)
I've tried to put my timeserie in a list, which is successful, with the following functon:
reversesales=t(data.frame(sales$qt1,sales$qt2,sales$qt3,sales$qt4,sales$qt5))
# I reverse rows and columns of the frame in order that the time periods be the rows
timeser<-ts(reversesales,start=1,end=5, deltat=1/4)
# deltat=1/4 because it is a quarterly basis, 1 and 5 because I have 5 quarters
Still, I am unable to do anything with this variable. I can't do any plot (with the "plot" function) as there are 2200 rows (and so R wants to make me 2200 successive plots, obviously this is not what I want).
In addition, I don't know how to determine the theoretical trend and the theoretical value of the sales for each period for each shop...
Thank you for your help! (and merry Christmas)
Upvotes: 0
Views: 775
Reputation: 9618
An implementation of mixed model:
install.packages("nlme")
library("nlme")
library(dplyr)
# Generating some data with a structure like yours:
start <- round(sample(10:100, 50, replace = TRUE)*runif(50))
df <- data_frame(shopID = 1:50, qt1 = start, qt2 =round(qt1*runif(50, .5, 2)) ,qt3 = round(qt2*runif(50, .5, 2)), qt4 = round(qt3*runif(50, .5, 2)), qt5 = round(qt4*runif(50, .5, 2)))
df <- as.data.frame(df)
# Converting in into the long format:
df <- reshape(df, idvar = "shopID", varying = names(df)[-1], direction = "long", sep = "")
Estimating the model:
mod <- lme(qt ~ time, random = ~ time | shopID, data = df)
# Extract the random effects for comparison:
random.effects(mod)
(Intercept) time
1 74.0790805 3.7034172
2 7.8713699 4.2138001
3 -8.0670810 -5.8754060
4 -16.5114428 16.4920663
5 -16.7098229 6.4685228
6 -11.9630688 -8.0411504
7 -12.9669777 21.3071366
8 -24.1099280 32.9274361
9 8.5107335 -9.7976905
10 -13.2707679 -6.6028927
11 3.6206163 -4.1017784
12 21.2342886 -6.7120725
13 -14.6489512 11.6847109
14 -14.7291647 2.1365768
15 10.6791941 3.2097199
16 -14.1524187 -1.6933291
17 5.2120647 8.0119320
18 -2.5172933 -6.5011416
19 -9.0094366 -5.6031271
20 1.4857512 -5.9913865
21 -16.5973442 3.5164298
22 -26.7724763 27.9264081
23 49.0764631 -12.9800871
24 -0.1512509 2.3589947
25 15.7723150 -7.9295698
26 2.1955489 11.0318875
27 -8.0890346 -5.4145977
28 0.1338790 -8.3551182
29 9.7113758 -9.5799588
30 -6.0257683 42.3140432
31 -15.7655545 -8.6226255
32 -4.1450984 18.7995079
33 4.1510104 -1.6384103
34 2.5107652 -2.0871890
35 -23.8640815 7.6680185
36 -10.8228653 -7.7370976
37 -14.1253093 -8.1738468
38 42.4114024 -9.0436585
39 -10.7453627 2.4590883
40 -12.0947901 -5.2763010
41 -7.6578305 -7.9630013
42 -14.9985612 -0.4848326
43 -13.4081771 -7.2655456
44 -11.5646620 -7.5365387
45 6.9116844 -10.5200339
46 70.7785492 -11.5522014
47 -7.3556367 -8.3946072
48 27.3830419 -6.9049164
49 14.3188079 -9.9334156
50 -15.2077850 -7.9161690
I would interpret the values as follows: consider them as a deviation from zero, so that positive values are positive deviations from the average, whereas negative values are negative deviation from the average. The averages of the two columns are zero, as is checked below:
round(apply(random.effects(mod), 2, mean))
(Intercept) time
0 0
Upvotes: 1
Reputation:
library(zoo)
#Reconstructing the data with four quarter columns (instead of five quarters as in your example)
shopID <- c(1, 2, 3, 4, 5)
sum <- c(150, 100, 500, 350, 50)
qt1 <- c(40, 10, 130, 50, 10)
qt2 <- c(40, 40, 110, 100, 15)
qt3 <- c(50, 30, 140, 150, 10)
qt4 <- c(20, 20, 120, 50, 15)
myDF <- data.frame(shopID, sum, qt1, qt2, qt3, qt4)
#The ts() function converts a numeric vector into an R time series object
ts1 <- ts(as.numeric((myDF[1,3:6])), frequency=4)
ts2 <- ts(as.numeric((myDF[2,3:6])), frequency=4)
ts3 <- ts(as.numeric((myDF[3,3:6])), frequency=4)
ts4 <- ts(as.numeric((myDF[4,3:6])), frequency=4)
ts5 <- ts(as.numeric((myDF[5,3:6])), frequency=4)
#Merge time series objects
tsm <- merge(a = as.zoo(ts1), b = as.zoo(ts2), c = as.zoo(ts3), d = as.zoo(ts4), e = as.zoo(ts5))
#Plotting the Time Series
plot.ts(tsm, plot.type = "single", lty = 1:5, xlab = "Time", ylab = "Sales")
The code is not optimized, and can be improved. More about time series analysis can be read here. Hope this gives some direction.
Upvotes: 1