Drastically different feature importance between very same data and very similar model for catboost

Question

Let me first explain about data set that I am using.

I have three set.

train with shape of (1277, 927), target is present about 12% of time
Eval set with shape of (174, 927), target is present about 11.5% of time
Hold out set with shape of (414, 927), target is present about 10% of time

This set is also building using time slices. Train set is oldest data. Hold out set is newest data. and Eval set is in middle set.

Now I am building two models.

Model1:

# Initialize CatBoostClassifier
model = CatBoostClassifier(
    # custom_loss=['Accuracy'],
    depth=9,
    random_seed=42,
    l2_leaf_reg=1,
    # has_time= True,
    iterations=300,
    learning_rate=0.05,
    loss_function='Logloss',
    logging_level='Verbose',
)

## Fitting catboost model
model.fit(
    train_set.values, Y_train.values,
    cat_features=categorical_features_indices,
    eval_set=(test_set.values, Y_test)
    # logging_level='Verbose'  # you can uncomment this for text output
)

predicting on hold out set.

Model2:

model = CatBoostClassifier(
    # custom_loss=['Accuracy'],
    depth=9,
    random_seed=42,
    l2_leaf_reg=1,
    # has_time= True,
    iterations= 'bestIteration from model1',
    learning_rate=0.05,
    loss_function='Logloss',
    logging_level='Verbose',

)

## Fitting catboost model
model.fit(
    train.values, Y.values,
    cat_features=categorical_features_indices,
    # logging_level='Verbose'  # you can uncomment this for text output
)

Both model is identical except iterations. First model has fix 300 round, but it will Shrink model to bestIteration. Where second model uses that bestIteration from model1.

However, When I compare feature importance. It looks drastically difference.

  Feature  Score_m1  Score_m2     delta
0      x0  3.612309  2.013193 -1.399116
1      x1  3.390630  3.121273 -0.269357
2      x2  2.762750  1.822564 -0.940186
3      x3  2.553052       NaN       NaN
4      x4  2.400786  0.329625 -2.071161

As you can see one of feature x3 which was on top3 in first model, dropped off in second model. Not only that but there is large shift in weights between models for given feature. There are about 60 features that are present in model1 are not present in model2. And there about 60 features that present in model2 are not present in model1. delta is difference between Score_m1 and Score_m2. I have seen where model changes score little bit not this drastic. AUC and LogLoss doesn't change that much when I use model1 or model2.

Now I have following questions regarding this situation.

Is this models are instable due to small number of sample and large number of features. If this is case, how to check for this?
Are there feature in this model are just not giving that much information regarding model outcome and there is random change that it is creating split. If this case how to check for this situation?
This catboost is right model for this situation ?

Any help regarding this issue will be appreciated

Drastically different feature importance between very same data and very similar model for catboost

Answers (1)

Related Questions