panador
panador

Reputation: 25

Is it possible to dynamically adjust contamination parameter in Isolation Forest?

I build an anomaly detection model using Isolation Forest with default setting for the contamination paramter (0.1). It works quite good on my current data set, but now I have different files with the same structure but different row count and once I run the model I don't get accurate results anymore without manually adjusting the contamination parameter through playing around until it fits.

I would like to run the model automatically as soon as I get a new file, but the percentage of outliers in my data varies in each file and it's not possbile to get good results since I always have to change the contamination parameter. Is there a way to calculate a new parameter every time a new file arrives or is this model not suitable for my use case?

Upvotes: 0

Views: 1034

Answers (1)

Jon Nordby
Jon Nordby

Reputation: 6299

The contamination parameter is a hyperparameter. It can be tuned with hyperparameter optimization. Typical approach in scikit-learn with small models/dataset would be to use gridsearch, see the user guide. This assumes that you have a robust quantitive way of evaluating your model performance.

Upvotes: 1

Related Questions