lightGBM predicts same value

Question

I have one problem concerning lgb. When I write

lgb.train(.......)

it finishes in less than milisecond. (for (10 000,25) ) shape dataset.

and when I write predict, all the output variables have same value.

train = pd.read_csv('data/train.csv', dtype = dtypes)
test = pd.read_csv('data/test.csv')

test.head()

X = train.iloc[:10000, 3:-1].values
y = train.iloc[:10000, -1].values

sc = StandardScaler()
X = sc.fit_transform(X)

#pca = PCA(0.95)
#X = pca.fit_transform(X)

d_train = lgb.Dataset(X, label=y)
params = {}
params['learning_rate'] = 0.003
params['boosting_type'] = 'gbdt'
params['objective'] = 'binary'
params['metric'] = 'binary_logloss'
params['sub_feature'] = 0.5
params['num_leaves'] = 10
params['min_data'] = 50
params['max_depth'] = 10

num_round = 10
clf = lgb.train(params, d_train, num_round, verbose_eval=1000)

X_test = sc.transform(test.iloc[:100,3:].values)

pred = clf.predict(X_test,  num_iteration = clf.best_iteration)

when I print pred, all the values are (0.49)

It's my first time using lightgbm module. Do I have some error in the code? or I should look for some mismatches in dataset.

Ugur MULUK · Accepted Answer

Your num_round is too small, it just starts to learn and stops there. Other than that, make your verbose_eval smaller, so see the results visually upon training. My suggestion for you to try the lgb.train code as below:

clf = lgb.train(params, d_train, num_boost_round=5000, verbose_eval=10, early_stopping_rounds = 3500)

Always use early_stopping_rounds since the model should stop if there is no evident learning or the model starts to overfit.

Do not hesitate to ask more. Have fun.

lightGBM predicts same value

Answers (1)

Related Questions