Reputation: 13
I am trying to automatically detect some data point in my x,y scatterplots. I have thousands of them, so I need to implement an approach that has a good trade-off between accuracy and sensitivity. Visually, I can see my 'anomalous' datapoints but I am very much struggling to statistically pick them up.
This is a typical scatter plot of my x,y data (please find xy data attached here: https://onlinetextsharing.com/untitled-722):
What I am mostly interested about, is to identify the data points with a positive deviation, namely those circled in red below:
The ones circled in blue, might be 'anomalous', but I understand they may be too close to the main cluster to be clearly (and/or statistically) picked as anomalous. The ones with a negative deviation (i.e., those circled in green) can also be flagged as anomalous, but I am not too interested on them.
What I am trying to achieve is something like the graph below (altough any other approach is more than welcome). Basically, I would like to fit a curve that pass through the main cluster and isolate the datapoints within this main cluster. Finally, I can flag those falling outside these hypothetical boundaries as potentially anomalous. Please note, the boundaries (as depicted by the red shaded area) do not need to be equally spaced along the curve, they can vary with the degree of spreading of the points, if that make sense.
I found a few ideas on this forum (i.e., Confidence interval for LOWESS in Python), but I am not sure these are applicable to my data. This is, pretty much, what I am after:
Example from: https://github.com/cerlymarco/tsmoothie
Any help is grately appreciated. Thanks in advance!
Upvotes: 0
Views: 389
Reputation: 656
Without some code little can be said but generic suggestions ( i put this as an answer and not a comment due to formatting)
You can fit a curve to the scattered data using polyfit().
Once you have the curve you can use an inequality on the distance (using a threshold) to determine outliers.
Here also are two similar queries in Mathworks forum that might be useful:
curve fitting to a scatter plot 1 curve fitting to a scatter plot 2
Upvotes: 0