FairyOnIce
FairyOnIce

Reputation: 2614

R function, nnet: what exactly "weights" input does?

To fit neural network to a dataset using R function nnet, I learned that when the cases are unevenly distributed across classes, I should weights each case properly (http://cowlet.org/2014/01/12/understanding-data-science-classification-with-neural-networks-in-r.html).

R function nnet has a "weights" input, and I would like to know how exactly this is doing. The help file only says "(case) weights for each example – if missing defaults to 1", which is not so clear to me. I originally thought that the weights are affecting the determination of threshold but not the back-propagation algorithm. However, my naive guess seems to be not correct. To see this, I generated very simple unevenly distributed two classes:

 library(nnet)

 p1 <- 0.05
 p2 <- 1 - p1
 Ntot <- 2000
 class <- sample(1:2,Ntot,prob=c(p1,p2),replace=TRUE)
 dat <- scale(cbind(f1=rnorm(Ntot,mean=class), f2=rnorm(Ntot,mean=class,sd=0.01)))

Then fitted the model with two nnet: one with case weights proportional to its class and another with all weights 1.

 myWeight <- rep(NA,length(class))
 myWeight[class==1] <- p1
 myWeight[class==2] <- p2
 set.seed(1)
 fitw <- nnet(class~.,data=dat,weights=myWeight,size=3,decay=0.1)
 set.seed(1)
 fit0 <- nnet(class~.,data=dat,size=3,decay=0.1)

Now I estimate the response values (ranging between 0 and 1).

 pred.raw.w <- predict(fitw,type="raw")
 pred.raw0 <- predict(fit0,type="raw")

 head(pred.raw.w)
 head(pred.raw0)

If my naive guess was true, I would have seen the same raw response estimates. I see that the two response values are different! This means that the weights must do something to the computation of back-propagation equation (and not just the threshold). Can anyone tell me what exactly weights is doing or direct me to reference?

Upvotes: 1

Views: 2891

Answers (1)

dylanjf
dylanjf

Reputation: 116

'case weights' refers to importance weighting of each observation. Weights can be used to tailor the ML algorithm to focus on certain aspects of the data.

Take, for example, a problem of forecasting sales for a store. It might be more important to project sales around weekends and holidays, as the majority of a store's volume is purchased during those times. You can then assign a column of weights that has weekdays as '1' and weekends/holidays as '2'.

Upvotes: 1

Related Questions