MadLordDev
MadLordDev

Reputation: 260

lightGBM predicts same value

I have one problem concerning lgb. When I write

lgb.train(.......)

it finishes in less than milisecond. (for (10 000,25) ) shape dataset.

and when I write predict, all the output variables have same value.

train = pd.read_csv('data/train.csv', dtype = dtypes)
test = pd.read_csv('data/test.csv')

test.head()

X = train.iloc[:10000, 3:-1].values
y = train.iloc[:10000, -1].values

sc = StandardScaler()
X = sc.fit_transform(X)

#pca = PCA(0.95)
#X = pca.fit_transform(X)

d_train = lgb.Dataset(X, label=y)
params = {}
params['learning_rate'] = 0.003
params['boosting_type'] = 'gbdt'
params['objective'] = 'binary'
params['metric'] = 'binary_logloss'
params['sub_feature'] = 0.5
params['num_leaves'] = 10
params['min_data'] = 50
params['max_depth'] = 10

num_round = 10
clf = lgb.train(params, d_train, num_round, verbose_eval=1000)

X_test = sc.transform(test.iloc[:100,3:].values)

pred = clf.predict(X_test,  num_iteration = clf.best_iteration)

when I print pred, all the values are (0.49)

It's my first time using lightgbm module. Do I have some error in the code? or I should look for some mismatches in dataset.

Upvotes: 2

Views: 5018

Answers (1)

Ugur MULUK
Ugur MULUK

Reputation: 464

Your num_round is too small, it just starts to learn and stops there. Other than that, make your verbose_eval smaller, so see the results visually upon training. My suggestion for you to try the lgb.train code as below:

clf = lgb.train(params, d_train, num_boost_round=5000, verbose_eval=10, early_stopping_rounds = 3500)

Always use early_stopping_rounds since the model should stop if there is no evident learning or the model starts to overfit.

Do not hesitate to ask more. Have fun.

Upvotes: 2

Related Questions