How to make Isolation Forest detect anomaly at the peak of the difference, instead of the first value seen

Question

I am using Isolation Forest to identify anomalies in a very large data frame. The data is noisy, so I have conducted many filtering operations to smooth out the noise so that the true anomalies present in the data stand out. I then used .diff() on this data set to create a straight line that spikes when an anomaly occurs. Isolation Forest is then used to identify these anomalies.

My issue is that Isolation Forest is identifying the anomaly at the earliest point it can detect an anomaly from occurring, but I need it to detect it at the peak difference.

df["Ref Wt. Denoised"] = denoise(df["Ref Wt."].values, level=2)
df["Ref Wt. Savgol"] = apply_savgol_filter(df["Ref Wt. Denoised"], window_length=101, polyorder=3)
df["Ref Wt. Smoothed"] = df["Ref Wt. Savgol"].rolling(window=indexer).mean()
df["Ref Wt. Diff"] = df["Ref Wt. Smoothed"].diff(periods=300).fillna(0)

df["WOB Anomaly"] = detect_wob.predict(df["Ref Wt. Diff"].values.reshape(-1, 1))

df["WOB Zero Event"] = df["WOB Anomaly"] == -1

I have played around using .shift() to fix it, but this manual change works for some values but not all. I really want to avoid changing the window size that I use to smooth the data over because this severely affects accuracy.

Image of Issue and Fix I'm Looking For

How to make Isolation Forest detect anomaly at the peak of the difference, instead of the first value seen

Answers (1)

Related Questions