numerical training data format of contextual bandit in Vowpal Wabbit

Question

I have plan to use contextual bandit of Vowpal Wabbit(VW) for building the recommend system.

I have M(26 in this case) dimensional numerical feature of N users, and have the feedback logs that contains information which user clicks which item(e.g. Ad). And the total number of valid actions slightly different each feedback logs (about 100~150). Only information from items(actions) is its unique ID.

So in this situation, I decided to use ADF learning mode (--cb_explore_adf). But in the tutorial, it seems the VW only takes care of categorical data type not numerical. Anyway, I tried to set test data format like below.

shared |User feat_0=1.0 feat_1=0.00389094278216362 feat_2=0.004632890224456787 feat_3=0.003936515189707279 feat_4=0.0053831832483410835 ...  feat_23=0.4192083477973938 feat_24=0.003969503100961447 feat_25=0.0038898871280252934
|Action item_id=hamny-kU9bbbbbak
|Action item_id=hamny-kU9bcxP9v1
... 
|Action item_id=hamny-bbbbbcxP9v 
|Action item_id=hamny-k7bbbbbcxd 
|Action item_id=hamny-bbbbbbbbbc 
|Action item_id=hamny-aaaaaaaaac

Above example asks CB model to produce pmf(predict) among 100 actions given 26D user context feature.

After getting prediction from model and reward, training data format would be ..

shared |User feat_0=1.0 feat_1=0.00389094278216362 feat_2=0.004632890224456787 feat_3=0.003936515189707279 feat_4=0.0053831832483410835 ...  feat_23=0.4192083477973938 feat_24=0.003969503100961447 feat_25=0.0038898871280252934
|Action item_id=hamny-kU9bbbbbak
|Action item_id=hamny-kU9bcxP9v1
... 
|Action item_id=hamny-bbbbbcxP9v 
0:-1:0.57124 |Action item_id=hamny-k7bbbbbcxd 
|Action item_id=hamny-bbbbbbbbbc 
|Action item_id=hamny-aaaaaaaaac

I'm not sure it is proper format or not. But when I run some simulation for CTR, I got gives almost same result from CB model regardless of exploration option (e.g. epsilon, bag, softmax etc.).

I just tried same logic in tutorial function(run_simulation). The only differences are example: shared context, number of actions, and ADF.

numerical training data format of contextual bandit in Vowpal Wabbit

Answers (1)

Related Questions