Zach Tynes
Zach Tynes

Reputation: 9

How to make Isolation Forest detect anomaly at the peak of the difference, instead of the first value seen

I am using Isolation Forest to identify anomalies in a very large data frame. The data is noisy, so I have conducted many filtering operations to smooth out the noise so that the true anomalies present in the data stand out. I then used .diff() on this data set to create a straight line that spikes when an anomaly occurs. Isolation Forest is then used to identify these anomalies.

My issue is that Isolation Forest is identifying the anomaly at the earliest point it can detect an anomaly from occurring, but I need it to detect it at the peak difference.

df["Ref Wt. Denoised"] = denoise(df["Ref Wt."].values, level=2)
df["Ref Wt. Savgol"] = apply_savgol_filter(df["Ref Wt. Denoised"], window_length=101, polyorder=3)
df["Ref Wt. Smoothed"] = df["Ref Wt. Savgol"].rolling(window=indexer).mean()
df["Ref Wt. Diff"] = df["Ref Wt. Smoothed"].diff(periods=300).fillna(0)

df["WOB Anomaly"] = detect_wob.predict(df["Ref Wt. Diff"].values.reshape(-1, 1))

df["WOB Zero Event"] = df["WOB Anomaly"] == -1

I have played around using .shift() to fix it, but this manual change works for some values but not all. I really want to avoid changing the window size that I use to smooth the data over because this severely affects accuracy.

Image of Issue and Fix I'm Looking For

Upvotes: 0

Views: 62

Answers (1)

michaelt
michaelt

Reputation: 347

If you could define a threshold, there's a potential for you to find the peaks and then test for an anomaly given a set of peaks:

from scipy.signal import find_peaks
peaks, _ = find_peaks(df["Ref Wt. Diff"], height=threshold)  
df["Peak Indicator"] = 0  # init the Peak Indicator column
df.loc[peaks, "Peak Indicator"] = 1  # Mark the peaks
peak_data = df[df["Peak Indicator"] == 1]
if not peak_data.empty:
    df["WOB Anomaly"] = np.nan  # init anomaly column
    df.loc[peak_data.index, "WOB Anomaly"] = detect_wob.predict(peak_data["Ref Wt. Diff"].values.reshape(-1, 1))

Upvotes: 0

Related Questions