Reputation: 155
I know that DBSCAN requires two parameters (minPts and Eps). However, I am confused on what parameters are needed for OPTICS because some sources say it requires eps while others say it only requires minPts.
Which algorithm would be the better to use if I were trying to automatically determine the parameter values that would best discard outliers?
Upvotes: 2
Views: 15277
Reputation: 79
In summary, there are a few differences:
Memory Cost : The OPTICS clustering technique requires more memory as it maintains a priority queue (Min Heap) to determine the next data point which is closest to the point currently being processed in terms of Reachability Distance. It also requires more computational power because the nearest neighbour queries are more complicated than radius queries in DBSCAN.
Fewer Parameters : The OPTICS clustering technique does not need to maintain the epsilon parameter and is only given in the above pseudo-code to reduce the time taken. This leads to the reduction of the analytical process of parameter tuning.
OPTICS does not segregate the given data into clusters. It merely produces a Reachability distance plot and it is upon the interpretation of the programmer to cluster the points accordingly.
OPTICS is Relatively insensitive to parameter settings. Good result if parameters are just “large enough”.
For more details, you can refer to
https://medium.com/@xzz201920/optics-d80b41fd042a for optics
https://medium.com/@xzz201920/dbscan-e1e50128074c for dbscan
Upvotes: 2
Reputation: 77474
OPTICS can be run with eps=infinity. But then it is O(n^2) complexity. (Assuming that you have an implementation that actually uses indexes for acceleration.)
But OPTICS doesn't have an as well-defined concept of noise as DBSCAN. The closest you can get is to take the topmost level of the cluster hierarchy (i.e. the complete data set) minus anything that is in a cluster below. But given a hierarchical clustering, you can have "noise" at multiple levels in the hierarchy, so the concept of noise doesn't really work here anymore.
Upvotes: 2
Reputation: 1750
According to the original paper, both minPts and Eps are required. Those sources which say Eps is not required are probably using some method to automatically determine a good value for it. However, Eps is only included to reduce the runtime of the algorithm. It is not required.
Regarding which is best for outlier-removal, there's no better way than to support your decision with numbers: take a dataset and label its outliers, then run both algorithms against it. Use some sort of performance measurement for the clusters (AUC, F-score, etc.) to select the best.
Upvotes: 3