LUm-1
LUm-1

Reputation: 41

catboost: evaluation/test set with weights for observations

I'm working on a dataset containing a list of people (indexed by the fiscal code). The target variable is binary (1: buy a book, 0: otherwise). All the predictors are categorical (ex: nationality, city, road, bin of the income and so on). A fiscal-code could be repeated twice and each instance/observation have a weight (1 if not repeated, a value between 0 and 1 if repeated).

For example, the dataset looks like

fiscal_code | weight | target | categorical info

AAAAA1 | 0.98 | 0 |......

AAAAA1 | 0.02 | 1 |........

I have two dataset (with the same variables), one for train (X_train=matrix of categorical variables , y_train that is the target variable, train_weight that is weight for every observation in the train set) and one for test (with the same variables and meaning: X_test, y_test and test_weight).

I try a Catboost model- CatBoostClassifier.

Inizialize booster and hyperparameters

categorical_features_indices = np.where(X.dtypes == np.category)[0]

model = CatBoostClassifier(iterations=5000, learning_rate=0.1, depth=7, loss_function='Logloss',eval_metric='AUC')

Fit model

model.fit(X_train,

        y_train,
         eval_set=(X_test,y_test),
         cat_features=categorical_features_indices,
         use_best_model=True,
         verbose=True,
         sample_weight=train_weight)

The question is: how can I take into account that the observations in the TEST set have weights too (test_weight) ? Do you have any idea?

I read the documentation on https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_catboostregressor_fit-docpage/ but I did not find anything useful, instead of lightgbm documentation (if considering another boosting model).

Upvotes: 2

Views: 4507

Answers (1)

David Waterworth
David Waterworth

Reputation: 2871

My understanding is this is a case where you need to use a Pool, i.e.

model.fit(Pool(X_train,y_train,weight=train_weight)
      eval_set=Pool(X_test,y_test,weight=test_weight),
      cat_features=categorical_features_indices,
      use_best_model=True,
      verbose=True)

Upvotes: 0

Related Questions