Reputation: 1
I am completely new to R and I'm trying to predict the Weekly_Sales for a test dataset by training a neural network using neuralnet package in R.
The data I have looked (train1):
Store Dep Date Temperature Fuel_Price MarkDown1 MarkDown2 MarkDown3 MarkDown4 MarkDown5 CPI Unemployment IsHoliday Rank Weekly_Sales
1 1 5/2/2010 42.31 2.572 -2000 -500 -100 -500 -700 211.0963582 8.106 0 13 24924.50
1 1 12/2/2010 38.51 2.548 -2000 -500 -100 -500 -700 211.2421698 8.106 1 13 46039.49
1 1 19/02/2010 39.93 2.514 -2000 -500 -100 -500 -700 211.2891429 8.106 0 13 41595.55
1 1 26/02/2010 46.63 2.561 -2000 -500 -100 -500 -700 211.3196429 8.106 0 13 19403.54
1 1 5/3/2010 46.50 2.625 -2000 -500 -100 -500 -700 211.3501429 8.106 0 13 21827.90
1 1 12/3/2010 57.79 2.667 -2000 -500 -100 -500 -700 211.3501429 8.106 0 13 21827.90
separation of data
>ind<- sample(2,nrow(train1),replace= TRUE,prob=c(0.7,0.3))
>train <- train1[ind==1,]
>test <- train1 [ind==2,]
for train
>head(train)
Store Dept Date Temperature Fuel_Price MarkDown1 MarkDown2 MarkDown3 MarkDown4 MarkDown5 CPI Unemployment IsHoliday Rank Weekly_Sales
1 1 5/2/2010 42.31 2.572 -2000 -500 -100 -500 -700 211.0963582 8.106 0 13 24924.50
1 1 26-02-2010 46.63 2.561 -2000 -500 -100 -500 -700 211.3196429 8.106 0 13 19403.54
1 1 5/3/2010 46.50 2.625 -2000 -500 -100 -500 -700 211.3501429 8.106 0 13 21827.90
1 1 19-03-2010 54.58 2.720 -2000 -500 -100 -500 -700 211.2156350 8.106 0 13 22136.64
1 1 26-03-2010 51.45 2.732 -2000 -500 -100 -500 -700 211.0180424 8.106 0 13 26229.21
1 1 2/4/2010 62.27 2.719 -2000 -500 -100 -500 -700 210.8204499 7.808 0 13 57258.43
for test
>head(test)
Store Dept Date Temperature Fuel_Price MarkDown1 MarkDown2 MarkDown3 MarkDown4 MarkDown5 CPI Unemployment IsHoliday Rank Weekly_Sales
1 1 12/2/2010 38.51 2.548 -2000 -500 -100 -500 -700 211.2421698 8.106 1 13 46039.49
1 1 19-02-2010 39.93 2.514 -2000 -500 -100 -500 -700 211.2891429 8.106 0 13 41595.55
1 1 12/3/2010 57.79 2.667 -2000 -500 -100 -500 -700 211.3806429 8.106 0 13 21043.39
1 1 7/5/2010 72.55 2.835 -2000 -500 -100 -500 -700 210.3399684 7.808 0 13 17413.94
1 1 21-05-2010 76.44 2.826 -2000 -500 -100 -500 -700 210.6170934 7.808 0 13 14773.04
1 1 28-05-2010 80.44 2.759 -2000 -500 -100 -500 -700 210.8967606 7.808 0 13 15580.43
The code I use looks as follows:
>library(neuralnet)
>n <-neuralnet(Weekly_Sales~Temperature+Fuel_Price+MarkDown1+MarkDown2+MarkDown3+MarkDown4+MarkDown5+CPI+Unemployment+IsHoliday+Rank,data= train,hidden=c(4,3),err.fct="sse",linear.output=FALSE)
>plot(n)
>output <- compute(n,test[,4:14])
>output1 <- output$net.result*(max(test$Weekly_Sales)-min(test$Weekly_Sales))+min(test$Weekly_Sales)
The neural network is trained and it is showing an error in the range of 10^13. Also I'm getting the same output every time, I am running the code and these predictions are not even close to the actual Weekly_Sales in test data. I have used datasets of another department but still getting the same predictions.
output
>head(output$net.result)
[,1]
2 0.9999999998
3 0.9999999998
6 0.9999999998
14 0.9999999998
16 0.9999999998
17 0.9999999998
> head(output1)
[,1]
2 149743.97
3 149743.97
6 149743.97
14 149743.97
16 149743.97
17 149743.97
Upvotes: 0
Views: 758
Reputation: 11955
You need to normalize your data before you apply neuralnet(). So before splitting train1 into train/ test, use below code
maximum <- apply(train1, 2, max)
minimum <- apply(train1, 2, min)
train1_scaled <- as.data.frame(scale(train1, center=minimum, scale = maximum- minimum))
Then use your code to split data and use below function
#linear.output should be TRUE as you are predicting continuos dependent variable
n <- neuralnet(Weekly_Sales~Temperature+Fuel_Price+MarkDown1+MarkDown2+MarkDown3+MarkDown4+MarkDown5+CPI+Unemployment+IsHoliday+Rank,data= train,hidden=c(4,3),err.fct="sse",linear.output=TRUE)
Below code will also need slight modification after this
#basically to convert it back to non-scaled version, you need to do it using non-scaled original data not 'test' dataset
output1 <- output$net.result*(max(train1$Weekly_Sales)-min(train1$Weekly_Sales))+min(train1$Weekly_Sales)
#also the dependent variable in test dataset will need conversion
test$Weekly_Sales_nonScaled <- test$Weekly_Sales*(max(train1$Weekly_Sales)-min(train1$Weekly_Sales))+min(train1$Weekly_Sales)
#After this you can compare original data (test$Weekly_Sales_nonScaled) with predicted data (output1)
Kindly don't forget to let us know if it helped :)
Upvotes: 1