Reputation: 85
my problem is as follow: I have a time series dataframe like so:
time value1 value2
1 random value here
2
3
4
5
I want to run IsolationForest for outlier detection on a rolling basis for column value1
and value2
. My approach so far has been to create a custom function, and run it with rolling
.
from sklearn.ensemble import IsolationForest
def outlier(data):
clf = IsolationForest()
prediction = clf.fit_predict(data[['value1', 'value2']].values)
return prediction[-1]
then running apply
on the data itself:
data.rolling(window='2D', on='time').apply(outlier)
This returns an error of KeyError: "None of [Index(['value1', 'value2'], dtype='object', name='time')] are in the [index]"
, which seems to be due to rolling
only considering one column as once.
I then tried setting the parameter method='table'
which returns ValueError: Data must be 1-dimensional, got ndarray of shape (1000, 2) instead
. Apparently it only works when your engine is numba, which I also tried to modify in the apply
method:
data.rolling(window='2D', on='time', method='table').apply(outlier, engine='numba', raw=True)
And now an issue arise with scikit-learn's compatibility with numba, with this error message Untyped global name 'IsolationForest': Cannot determine Numba type of <class 'abc.ABCMeta'>
.
So I'm now wondering, is there a way to run a scikit-learn's model with rolling functions, either by not using the 'table' method, or by somehow making it compatible with numba.
Upvotes: 0
Views: 79