Vowpal Wabbit feature extraction

Question

I'm confused about the way vw extracts features. Consider a text classification problem where I want to use character ngrams as features. In the simplest case that illustrates my question, the input string is "aa" and I use 1-gram features only. So, the example should consist of a single feature "a" that has a count of 2, as follows:

$ echo "1 |X a:2" | vw --noconstant --invert_hash f && grep '^X^' f
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = 
num sources = 1
average    since         example     example  current  current  current
loss       last          counter      weight    label  predict features
1.000000   1.000000            1         1.0   1.0000   0.0000        1

finished run
number of examples per pass = 1
passes used = 1
weighted example sum = 1
weighted label sum = 1
average loss = 1
best constant = 1
total feature number = 1
X^a:108118:0.196698

However, if I pass the character string "aa" to vw (introducing spaces between the characters), vw reports 2 features:

$ echo "1 |X a a" | vw --noconstant --invert_hash f && grep '^X^' f
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = 
num sources = 1
average    since         example     example  current  current  current
loss       last          counter      weight    label  predict features
1.000000   1.000000            1         1.0   1.0000   0.0000        2

finished run
number of examples per pass = 1
passes used = 1
weighted example sum = 1
weighted label sum = 1
average loss = 1
best constant = 1
total feature number = 2
X^a:108118:0.375311

The actual model contains only a single feature (as I would expect), but its weight (0.375311) is different than in the first model (0.196698).

When training on real datasets with higher-order n-grams, substantial differences in average loss can be observed depending on which input format is used. I looked at the source code in parser.cc, and given more time I could probably figure out what's going on; but if someone could please explain the discrepancy between the two cases above (is it a bug?) and/or point me to the relevant portions of the source, I'd appreciate the help.

truf · Accepted Answer

I suppose total feature number value is just a counter of observed features. For example you'll get 10 for following command:

$ echo "1 |X a" | vw --noconstant --passes 10 --cache_file f -k

I also saw in vw code that divides feature's regressor value by feature weight before printing out. That could be seen from following:

$ echo "1 |X a:1" | vw --noconstant --invert_hash f && grep '^X^' f
X^a:108118:0.393395
$ echo "1 |X a:2" | vw --noconstant --invert_hash f && grep '^X^' f
X^a:108118:0.196698
$ echo "1 |X a:3" | vw --noconstant --invert_hash f && grep '^X^' f
X^a:108118:0.131132
$ echo "1 |X a:10" | vw --noconstant --invert_hash f && grep '^X^' f
X^a:108118:0.039344

I would suspect that features are exclusive and examples like "|X a" and "|X a a" shall give same result, but they're not:

$ echo "1 |X a" | vw --noconstant --invert_hash f && grep '^X^' f
X^a:108118:0.393395
$ echo "1 |X a a" | vw --noconstant --invert_hash f && grep '^X^' f
X^a:108118:0.375311
$ echo "1 |X a a" | vw --noconstant --invert_hash f && grep '^X^' f
X^a:108118:0.366083

I don't really know why. There shall be a logic behind that. But it works as expected (by me) if you specify --sort_features

$ echo "1 |X a" | vw --noconstant --invert_hash f && grep '^X^' f
X^a:108118:0.393395
echo "1 |X a a a a a" | vw --noconstant --invert_hash f --sort_features && grep '^X^' f
X^a:108118:0.393395

The interesting fact is that if you specify --sort_features vw uses only first occurrence of feature. Example:

$ echo "1 |X a a:10" | vw --noconstant --invert_hash f --sort_features && grep '^X^' f
X^a:108118:0.393395
$ echo "1 |X a a:2" | vw --noconstant --invert_hash f --sort_features && grep '^X^' f
X^a:108118:0.393395
$ echo "1 |X a:10 a" | vw --noconstant --invert_hash f --sort_features && grep '^X^' f
X^a:108118:0.039344

I hope with these observations you will be able to make vw work as you need. But I'm not sure if this is a bug or feature. Will forward to vw authors to comment.

Vowpal Wabbit feature extraction

Answers (1)

Related Questions