The Great
The Great

Reputation: 7743

Efficient way to generate Lime explanations for full dataset

Am working on a binary classification problem with 1000 rows and 15 features.

Currently am using Lime to explain the predictions of each instance.

I use the below code to generate explanations for full test dataframe

test_indx_list = X_test.index.tolist()
test_dict={}
for n in test_indx_list:
    exp = explainer.explain_instance(X_test.loc[n].values, model.predict_proba, num_features=5)
    a=exp.as_list()
    test_dict[n] = a

But this is not efficient. Is there any alternative approach to generate explanation/ get feature contributions quicker?

Upvotes: 6

Views: 3596

Answers (1)

Arson 0
Arson 0

Reputation: 777

From what the docs show, there isn't currently an option to do batch explain_instance, although there are plans for it. This should help a lot with speed on newer versions later on.

What seems to be the most appropriate change to get better speed is decreasing the number of samples used to learn the linear model.

explainer.explain_instance(... num_features=5, num_samples=2500)

The default value for num_samples is 5000, which can be much more than you need depending on your model, and is currently the argument that will most affect the speed of the explainer.

Another approach would be to try adding parallelization to the snippet. It's a more complex solution where you run multiple instances of the snippet at the same time, and gather the results at the end. For that, I leave a link, but really it's not something I can give a snippet right out of the box.

Upvotes: 4

Related Questions