Reputation: 2614
To fit neural network to a dataset using R function nnet, I learned that when the cases are unevenly distributed across classes, I should weights each case properly (http://cowlet.org/2014/01/12/understanding-data-science-classification-with-neural-networks-in-r.html).
R function nnet has a "weights" input, and I would like to know how exactly this is doing. The help file only says "(case) weights for each example – if missing defaults to 1", which is not so clear to me. I originally thought that the weights are affecting the determination of threshold but not the back-propagation algorithm. However, my naive guess seems to be not correct. To see this, I generated very simple unevenly distributed two classes:
library(nnet)
p1 <- 0.05
p2 <- 1 - p1
Ntot <- 2000
class <- sample(1:2,Ntot,prob=c(p1,p2),replace=TRUE)
dat <- scale(cbind(f1=rnorm(Ntot,mean=class), f2=rnorm(Ntot,mean=class,sd=0.01)))
Then fitted the model with two nnet: one with case weights proportional to its class and another with all weights 1.
myWeight <- rep(NA,length(class))
myWeight[class==1] <- p1
myWeight[class==2] <- p2
set.seed(1)
fitw <- nnet(class~.,data=dat,weights=myWeight,size=3,decay=0.1)
set.seed(1)
fit0 <- nnet(class~.,data=dat,size=3,decay=0.1)
Now I estimate the response values (ranging between 0 and 1).
pred.raw.w <- predict(fitw,type="raw")
pred.raw0 <- predict(fit0,type="raw")
head(pred.raw.w)
head(pred.raw0)
If my naive guess was true, I would have seen the same raw response estimates. I see that the two response values are different! This means that the weights must do something to the computation of back-propagation equation (and not just the threshold). Can anyone tell me what exactly weights is doing or direct me to reference?
Upvotes: 1
Views: 2891
Reputation: 116
'case weights' refers to importance weighting of each observation. Weights can be used to tailor the ML algorithm to focus on certain aspects of the data.
Take, for example, a problem of forecasting sales for a store. It might be more important to project sales around weekends and holidays, as the majority of a store's volume is purchased during those times. You can then assign a column of weights that has weekdays as '1' and weekends/holidays as '2'.
Upvotes: 1