Reputation: 2969
I am sorry, I do feel I am overlooking something really obvious.
But how can the following happen:
$ cat myTrainFile.txt
1:1 |f 1:12 2:13
2:1 |f 3:23 4:234
3:1 |f 5:12 6:34
$ cat myTestFile.txt
14:1 |f 1:12 2:13
14:1 |f 3:23 4:234
14:1 |f 5:12 6:34
$ vw --csoaa 3 -f myModel.model --compressed < myTrainFile.txt
final_regressor = myModel.model
...
...
$ vw -t -i myModel.model -p myPred.pred < myTestFile.txt
only testing
Num weight bits = 18
...
...
$ cat myPred.pred
14.000000
14.000000
14.000000
So the test file is identical to the train file, but for the labels. Hence, I would expect vw to produce the original labels that it learned from the train file, as it ignores the labels in the test file completely.
However, it seems to reproduce the labels form the test file?!?
Clearly, I am doing something completely wrong here... but what?
Upvotes: 2
Views: 2124
Reputation: 2969
So, for completeness' sake, here is how it does work:
$ cat myTrainFile.txt
1:1.0 |f 1:12 2:13
2:1.0 |f 3:23 4:234
3:1.0 |f 5:12 6:34
$ cat myTestFile.txt
1 2 3 |f 1:12 2:13
1 2 3 |f 3:23 4:234
1 2 3 |f 5:12 6:34
$ vw -t -i myModel.model -p myPred.pred < myTestFile.txt
only testing
...
$ cat myPred.pred
2.000000
1.000000
2.000000
So it is a bit suprising maybe that none of examples is classified correctly, but that is another problem.
Thanks @Martin Popel!
Upvotes: 0
Reputation: 2670
If you specify just one label in --csoaa (even in the -t test mode), it means that only that label is "available" for this example, so no other label can be predicted. This is another difference from --oaa (where you always specify just the correct label). See https://groups.yahoo.com/neo/groups/vowpal_wabbit/conversations/topics/2949.
If all labels are "available" (possible) for any test example, you must always include all the labels on each line. With -t you do not need to include the costs of the labels if you just want to get the --predictions (if you don't need vw to compute the test loss). So your myTestFile.txt should look like:
1 2 3 |f 1:12 2:13
1 2 3 |f 3:23 4:234
1 2 3 |f 5:12 6:34
and your myTrainFile.txt should look like:
1:0 2:1 3:1 |f 1:12 2:13
1:1 2:0 3:1 |f 3:23 4:234
1:1 2:1 3:0 |f 5:12 6:34
Upvotes: 5