Reputation: 1622
I have trained the following flaml autoML (I have specified the algo to be a lightGBM):
automl = AutoML()
automl.fit(
X_train,
y_train,
estimator_list=["lgbm"],
task="classification",
metric="roc_auc",
eval_method="cv",
n_splits=3,
time_budget=training_seconds,
sample=True,
append_log=True,
log_type="all",
log_file_name = log_name,
model_history=True,
log_training_metric=True,
verbose=3,
seed=1234,
early_stop=True
)
A total of 3 features (x2
, x3
and x5
) are exposed to the model. I then check the features importance:
ds = pd.DataFrame()
ds["feature"] = automl.model.estimator.feature_name_
ds["importance"] = automl.model.estimator.feature_importances_
ds = ds.sort_values(by="importance", ascending=False).reset_index(drop=True)
When I check the feature importance I get this:
Question. Does that tell me that only one feature (x3
) has made it into the model? If so, why is the feature importance equal to 0 for x2
and x5
? I thought they wouldn't appear in the Features Importance table?
Upvotes: 0
Views: 1159
Reputation: 2670
For LightGBM, every feature has a reported feature importance, even those that are not used by any splits in the model.
Consider the following example in Python, using lightgbm==3.3.3
.
import lightgbm as lgb
from sklearn.datasets import make_regression
# create a 3-feature dataset where only one feature is important
X, y = make_regression(
n_samples=10_000,
n_features=3,
n_informative=1
)
dtrain = lgb.Dataset(
data=X,
label=y,
)
# train a tiny model
bst = lgb.train(
train_set=dtrain,
params={
"num_iterations": 10,
"num_leaves": 5
}
)
# look: all features included in feature performance
bst.feature_importance()
# array([ 0, 0, 40], dtype=int32)
Upvotes: 1