aiedu
aiedu

Reputation: 142

XGBoost pairwise setup - python

In XGBoost I have tried multiple ways to make pairwise group work with group set, but without success. The following code doesn't work when using set_group but is fine with set_group commented out for xgbTrain

import xgboost
import pandas as pd
from xgboost import DMatrix,train

xgb_params ={    
    'booster' : 'gbtree',
    'eta': 0.1,
    'gamma' : 1.0 ,
    'min_child_weight' : 0.1,
    'objective' : 'rank:pairwise',
    'eval_metric' : 'merror',
    #'num_class': 3,  # 
    'max_depth' : 6,
    'num_round' : 4,
    'save_period' : 0 
}


n_group=2
n_choice=3    

#training dataset

dtrain=np.random.uniform(0,100,[n_group*n_choice,2])    
dtarget=np.array([np.random.choice([0,1,2],3,False) for i in range(n_group)]).flatten()
dgroup=np.array([np.repeat(i,3)for i in range(n_group)]).flatten()

xgbTrain = DMatrix(dtrain, label = dtarget)
xgbTrain =xgbTrain.set_group(dgroup)

#watchlist

dtrain_eval=np.random.uniform(0,100,[n_group*n_choice,2])        

xgbTrain_eval = DMatrix(dtrain_eval, label = dtarget)
#xgbTrain_eval =xgbTrain_eval .set_group(dgroup)

#test dataset

dtest=np.random.uniform(0,100,[n_group*n_choice,2])    
dtestgroup=np.array([np.repeat(i,3)for i in range(n_group)]).flatten()

xgbTest = DMatrix(dtest)
#xgbTest =xgbTest.set_group(dgroup)
evallist  = [(xgbTrain_eval, 'eval')]

rankModel = xgboost.train(params=xgb_params,dtrain=xgbTrain  )
print(rankModel.predict( xgbTest))

The error returned seem to point to the lack of eval data but even specifying the evals as

 rankModel = xgboost.train(params=xgb_params,dtrain=xgbTrain,evals=evallist )

the error remains.

Note that num_class is commented out but intuitively it should have a value either 3 ( here corresponding to the number of class ) or 2 (for the number of group in the case of pairwise ranking)?

Any help in pointing to what is wrong?

(Xgboost 0.6)

Upvotes: 3

Views: 3727

Answers (1)

aiedu
aiedu

Reputation: 142

An error: mea cupla, the set_group is incorrect and should be

     xgbTrain.set_group(dgroup)

and not

     xgbTrain =xgbTrain.set_group(dgroup)

The solution:

The data in the set_group should just be the count of each items per group with one item per group.

      dgroup=np.array([n_choice for i in range(n_group)]).flatten()

That did it!

Upvotes: 2

Related Questions