Giampaolo Levorato
Giampaolo Levorato

Reputation: 1622

Why do some features get a feature importance of 0 in lightGBM?

I have trained the following flaml autoML (I have specified the algo to be a lightGBM):

            automl = AutoML()
            automl.fit(
                X_train,
                y_train,
                estimator_list=["lgbm"],
                task="classification",
                metric="roc_auc",
                eval_method="cv",
                n_splits=3,
                time_budget=training_seconds,
                sample=True,
                append_log=True,
                log_type="all",
                log_file_name = log_name,
                model_history=True,
                log_training_metric=True,
                verbose=3,
                seed=1234,
                early_stop=True
            )

A total of 3 features (x2, x3 and x5) are exposed to the model. I then check the features importance:

    ds = pd.DataFrame()
    ds["feature"] = automl.model.estimator.feature_name_
    ds["importance"] = automl.model.estimator.feature_importances_
    ds = ds.sort_values(by="importance", ascending=False).reset_index(drop=True)

When I check the feature importance I get this:

feature,importance
x3,351
x2,0
x5,0

Question. Does that tell me that only one feature (x3) has made it into the model? If so, why is the feature importance equal to 0 for x2 and x5? I thought they wouldn't appear in the Features Importance table?

Upvotes: 0

Views: 1159

Answers (1)

James Lamb
James Lamb

Reputation: 2670

For LightGBM, every feature has a reported feature importance, even those that are not used by any splits in the model.

Consider the following example in Python, using lightgbm==3.3.3.

import lightgbm as lgb
from sklearn.datasets import make_regression

# create a 3-feature dataset where only one feature is important
X, y = make_regression(
    n_samples=10_000,
    n_features=3,
    n_informative=1
)
dtrain = lgb.Dataset(
    data=X,
    label=y,
)

# train a tiny model
bst = lgb.train(
    train_set=dtrain,
    params={
        "num_iterations": 10,
        "num_leaves": 5
    }
)

# look: all features included in feature performance
bst.feature_importance()

# array([ 0,  0, 40], dtype=int32)

Upvotes: 1

Related Questions