Reputation: 377
I would like to run a linear regression on vowpal wabbit using the null model (intercept only - for comparison reasons). Which optimizer should I use for this? Also is the best constant loss reported that of the simple average?
Upvotes: 3
Views: 252
Reputation: 5952
A1: For linear regression, if you care about averages, you should use --loss_function squared
(which is the default). If you care more about the median rather than the average (e.g. if you have some outliers that may greatly mess-up the average), use --loss_function quantile
. BTW: these are not optimizers, just loss functions. I would leave the optimizer (enhanced SGD) as is (the default) since it works very well.
A2: best constant
is the constant prediction that would give the lowest error, and best constant loss
is the average error for always predicting that best constant
number. It is the weighted average of all your target-variables. This is not the same as the intercept b
in the linear-regression formula y = Ai*xi + B
. B
is the free term, independent of the inputs. B
is not necessarily the average of the y
s.
A3: If you want to find the intercept of your model, look for the weight named Constant
in your model. This would require two short steps:
# 1) Train your model from the dataset
# and save the model in human-readable (aka "inverted hash") format
vw --invert_hash model.ih your_dataset
# 2) Search for the free/intercept term in the readable model
grep '^Constant:' model.ih
The output of the grep
step should be something like:
Constant:116060:-1.085126
Where 116060
is the hash-slot (location in the model) and -1.085126
is the value of the intercept (assuming no hash collisions, and a linear combination of the inputs.)
Upvotes: 1