Confusion with Vowpal Wabbit Contextual Bandit training data formatting

Question

I am new to Vowpal Wabbit and am working on a multi-arm bandit model to recommend different CTAs for sign up pop ups. I already completed the walkthrough on the main site but am a bit confuse on what the training data is supposed to look like for the --cb_explore_adf version. So far, for regular versions (with set action totals) the data looks like:

action:cost:probability | features

which makes sense, but then when you get to the adf version, it becomes:

| a:1 b:0.5
0:0.1:0.75 | a:0.5 b:1 c:2
 
shared | s_1 s_2
0:1.0:0.5 | a:1 b:1 c:1
| a:0.5 b:2 c:1

I've read the documentation numerous times and I still don't understand how this works.

I think an example of data similar to mine of how it would be adapted to the above version would be great.

Example of my use case: 2 actions: 1 and 2 3 features: language, country, favorite sport

Some of the docs that I've looked at:

https://vowpalwabbit.org/tutorials/cb_simulation.html

[EDIT]:

Playing around with it, I created a train.txt with this input:

shared |user language=en nation=CAN
|action arm=10-OC-ValueProp10 
0:0:0.5 |action arm=11-OC-ValueProp11 

shared |user language=it nation=ITA
|action arm=10-OC-ValueProp10 
0:0:0.5 |action arm=11-OC-ValueProp11 

shared |user language=it nation=ITA
0:0:0.5 |action arm=10-OC-ValueProp10 
|action arm=11-OC-ValueProp11 

shared |user language=it nation=ITA
0:0:0.5 |action arm=10-OC-ValueProp10 
|action arm=11-OC-ValueProp11

But when I run this:

vw = pyvw.vw("-d full_data.txt --cb_explore_adf -q ua --quiet --epsilon 0.2")
vw.predict("|user language=en nation=USA")

I get a [1.0] which doesn't make sense. I am sure that I am doing something wrong.

Confusion with Vowpal Wabbit Contextual Bandit training data formatting

Answers (1)

Training data

Related Questions