Reputation: 57
i am trying to find anomalies in my dataset of 1000+ documents. I'm using LIME ML Interpreter to be able to explain the model (Isolation Forest) predictions. In one parameter "mode" i am able to choose between Classification and Regression. I do not have a set of documents with a known anomaly. Since Isolation Forest is a unsupervised learning method and classifcation is a type of supervised learning which is used to clasify observations into two or more classses i ended up using regression. On the other side i have the outcome anomaly or no anomaly.
What is right to use here?
Best Regards, Elle
Upvotes: 0
Views: 1055
Reputation: 1133
The other option I see to this is to hold out 10-20% of the data set during IsoForest tree building. On this holdout to score the model and get the anomaly score (or avg tree depth) and build the explainer on this. Then in scoring new data, LIME will treat it as a regression problem...I am not sure how well this will work though...
Upvotes: 0
Reputation: 6299
Not directly about LIME, but Shapley values can be used to create similar explanations for IsolationForest. See this answer.
Upvotes: 0
Reputation: 11
For us, what we have done is as follows:
We are also trying to find a better option instead of building second level Random Forest classifier.
Upvotes: 1