Null linear regression model in vowpal wabbit

Question

I would like to run a linear regression on vowpal wabbit using the null model (intercept only - for comparison reasons). Which optimizer should I use for this? Also is the best constant loss reported that of the simple average?

arielf · Accepted Answer

A1: For linear regression, if you care about averages, you should use --loss_function squared (which is the default). If you care more about the median rather than the average (e.g. if you have some outliers that may greatly mess-up the average), use --loss_function quantile. BTW: these are not optimizers, just loss functions. I would leave the optimizer (enhanced SGD) as is (the default) since it works very well.

A2: best constant is the constant prediction that would give the lowest error, and best constant loss is the average error for always predicting that best constant number. It is the weighted average of all your target-variables. This is not the same as the intercept b in the linear-regression formula y = Ai*xi + B. B is the free term, independent of the inputs. B is not necessarily the average of the ys.

A3: If you want to find the intercept of your model, look for the weight named Constant in your model. This would require two short steps:

# 1) Train your model from the dataset
#    and save the model in human-readable (aka "inverted hash") format
vw --invert_hash model.ih your_dataset

# 2) Search for the free/intercept term in the readable model 
grep '^Constant:' model.ih

The output of the grep step should be something like:

Constant:116060:-1.085126

Where 116060 is the hash-slot (location in the model) and -1.085126 is the value of the intercept (assuming no hash collisions, and a linear combination of the inputs.)

Null linear regression model in vowpal wabbit

Answers (1)

Related Questions