bhaskarc
bhaskarc

Reputation: 9521

Grid search with LightGBM example

I am trying to find the best parameters for a lightgbm model using GridSearchCV from sklearn.model_selection. I have not been able to find a solution that actually works.

I have managed to set up a partly working code:

import numpy as np
import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold

np.random.seed(1)

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
y = pd.read_csv('y.csv')
y = y.values.ravel()
print(train.shape, test.shape, y.shape)

categoricals = ['COL_A','COL_B']
indexes_of_categories = [train.columns.get_loc(col) for col in categoricals]

gkf = KFold(n_splits=5, shuffle=True, random_state=42).split(X=train, y=y)

param_grid = {
    'num_leaves': [31, 127],
    'reg_alpha': [0.1, 0.5],
    'min_data_in_leaf': [30, 50, 100, 300, 400],
    'lambda_l1': [0, 1, 1.5],
    'lambda_l2': [0, 1]
    }

lgb_estimator = lgb.LGBMClassifier(boosting_type='gbdt',  objective='binary', num_boost_round=2000, learning_rate=0.01, metric='auc',categorical_feature=indexes_of_categories)

gsearch = GridSearchCV(estimator=lgb_estimator, param_grid=param_grid, cv=gkf)
lgb_model = gsearch.fit(X=train, y=y)

print(lgb_model.best_params_, lgb_model.best_score_)

This seems to be working but with a UserWarning:

categorical_feature keyword has been found in params and will be ignored. Please use categorical_feature argument of the Dataset constructor to pass this parameter.

I am looking for a working solution or perhaps a suggestion on how to ensure that lightgbm accepts categorical arguments in the above code

Upvotes: 12

Views: 43716

Answers (2)

saeedghadiri
saeedghadiri

Reputation: 436

In case you are struggling with how to pass the fit_params, which happened to me as well, this is how you should do that:

fit_params = {'categorical_feature':indexes_of_categories}
clf = GridSearchCV(model, param_grid, cv=n_folds)
clf.fit(x_train, y_train, **fit_params)

Upvotes: 1

Mischa Lisovyi
Mischa Lisovyi

Reputation: 3213

As the warning states, categorical_feature is not one of the LGBMModel arguments. It is relevant in lgb.Dataset instantiation, which in the case of sklearn API is done directly in the fit() method see the doc. Thus, in order to pass those in the GridSearchCV optimisation one has to provide it as an argument of the GridSearchCV.fit() method in the case of sklearn v0.19.1 or as an additional fit_params argument in GridSearchCV instantiation in older sklearn versions

Upvotes: 8

Related Questions