Isolation forest, how to use multiple features to predict, getting all anomalies

Question

I am trying to build an isolation forest using scikit learn and python to detect anomalies. I have attached an image of what the data may look like, and I am trying to predict 'pages' based on several 'size' features. When I print(anomaly), every single row is detected as -1, an anomaly. Is this because I am only using 'size2' to classify them? Is there a way to use multiple columns to help in detecting the anomalies? Should I be making n_features equal to the number of columns I am using? Thank you so much for your help.

model = IsolationForest(n_estimators = 100, max_samples = 'auto', contamination = 'auto')
model.fit(df[['pages']])
df['size2'] = model.decision_function(df[['pages']])
df['anomaly']= model.predict(df[['pages']])
print(df.head(50))
anomaly = df.loc[df['anomaly']==-1]
anomaly_index = list(anomaly.index)
print(anomaly)

Isolation forest, how to use multiple features to predict, getting all anomalies

Answers (1)

Related Questions