Reputation: 371
I'm experimenting LightGBM through Training API http://lightgbm.readthedocs.io/en/latest/Python-API.html#training-api and Scikit-learn API http://lightgbm.readthedocs.io/en/latest/Python-API.html#scikit-learn-api.
I've not been able to make a clear mapping between both API as highlighted in example below. The basic idea is to train on 50% of the synthetic dataset.
import numpy as np
import lightgbm as lgbm
# Generate Data Set
xs = np.linspace(0, 10, 100).reshape((-1, 1))
ys = xs**2 + 4*xs + 5.2
ys = ys.reshape((-1,))
# LGBM configuration
alg_conf = {
"num_boost_round":25,
"max_depth" : 3,
"num_leaves" : 31,
'learning_rate' : 0.1,
'boosting_type' : 'gbdt',
'objective' : 'regression_l2',
"early_stopping_rounds": None,
}
# Calling Regressor using scikit-learn API
sk_reg = lgbm.sklearn.LGBMRegressor(
num_leaves=alg_conf["num_leaves"],
n_estimators=alg_conf["num_boost_round"],
max_depth=alg_conf["max_depth"],
learning_rate=alg_conf["learning_rate"],
objective=alg_conf["objective"]
)
sk_reg.fit(xs[::2], ys[::2])
print("Scikit-learn API results")
print(sk_reg.predict(xs[1::2]))
# Calling Regressor using native API
train_dataset = lgbm.Dataset(xs[::2], ys[::2])
lg_reg = lgbm.train(alg_conf.copy(), train_dataset)
print("Native API results")
print(lg_reg.predict(xs[1::2]))
Scikit-learn API results
[ 14.35693851 14.35693851 14.35693851 14.35693851 14.35693851
14.35693851 14.35693851 14.35693851 14.35693851 14.35693851
25.37944751 25.37944751 25.37944751 25.37944751 25.37944751
35.10572544 35.10572544 35.10572544 35.10572544 35.10572544
46.50667974 46.50667974 46.50667974 46.50667974 46.50667974
59.44952419 59.44952419 59.44952419 59.44952419 59.44952419
75.42846332 75.42846332 75.42846332 75.42846332 75.42846332
109.4610814 109.4610814 109.4610814 109.4610814 109.4610814
109.4610814 109.4610814 109.4610814 109.4610814 109.4610814
109.4610814 109.4610814 109.4610814 109.4610814 109.4610814 ]
Native API results
[ 22.55947971 22.55947971 22.55947971 22.55947971 22.55947971
22.55947971 22.55947971 22.55947971 22.55947971 22.55947971
22.55947971 22.55947971 22.55947971 22.55947971 22.55947971
22.55947971 22.55947971 22.55947971 22.55947971 22.55947971
45.33537795 45.33537795 45.33537795 45.33537795 45.33537795
91.6376959 91.6376959 91.6376959 91.6376959 91.6376959
91.6376959 91.6376959 91.6376959 91.6376959 91.6376959
91.6376959 91.6376959 91.6376959 91.6376959 91.6376959
91.6376959 91.6376959 91.6376959 91.6376959 91.6376959
91.6376959 91.6376959 91.6376959 91.6376959 91.6376959 ]
Where could I find a clear equivalence between both API parameters ?
Thanks a lot.
Upvotes: 8
Views: 5638
Reputation: 371
I obtained the answer on LightGBM GitHub. Sharing the results below:
Adding alg_conf "min_child_weight": 1e-3, "min_child_samples": 20)
fixes the difference:
import numpy as np
import lightgbm as lgbm
# Generate Data Set
xs = np.linspace(0, 10, 100).reshape((-1, 1))
ys = xs**2 + 4*xs + 5.2
ys = ys.reshape((-1,))
# Or you could add to your alg_conf "min_child_weight": 1e-3, "min_child_samples": 20.
# LGBM configuration
alg_conf = {
"num_boost_round":25,
"max_depth" : 3,
"num_leaves" : 31,
'learning_rate' : 0.1,
'boosting_type' : 'gbdt',
'objective' : 'regression_l2',
"early_stopping_rounds": None,
"min_child_weight": 1e-3,
"min_child_samples": 20
}
# Calling Regressor using scikit-learn API
sk_reg = lgbm.sklearn.LGBMRegressor(
num_leaves=alg_conf["num_leaves"],
n_estimators=alg_conf["num_boost_round"],
max_depth=alg_conf["max_depth"],
learning_rate=alg_conf["learning_rate"],
objective=alg_conf["objective"],
min_sum_hessian_in_leaf=alg_conf["min_child_weight"],
min_data_in_leaf=alg_conf["min_child_samples"]
)
sk_reg.fit(xs[::2], ys[::2])
print("Scikit-learn API results")
print(sk_reg.predict(xs[1::2]))
# Calling Regressor using native API
train_dataset = lgbm.Dataset(xs[::2], ys[::2])
lg_reg = lgbm.train(alg_conf.copy(), train_dataset)
print("Native API results")
print(lg_reg.predict(xs[1::2]))
It works fine.
Upvotes: 5