Reputation: 67
I have seen many DBSCAN algorithm implemented using a formula to estimate the neighborhood radius (Eps) based on the given minimum points within a cluster (k).
[full code] http://toolz.googlecode.com/svn/trunk/CWT/dbscan.py
% Analytical calculation of rad if not given
function [Eps] = epsilon(x,k)
[m,n] = size(x);
Eps = ((prod(max(x)-min(x))*k*gamma(.5*n+1))/(m*sqrt(pi.^n))).^(1/n);
I have searched extensively to understand how this analytical formula was derived but been unsuccessful.
Upvotes: 2
Views: 1093
Reputation: 2702
The estimation of a suboptimal radius is described in the OPTICS paper
Looking for Natural Patterns in Analytical Data. 2. Tracing Local Density with OPTICS
As outline in the paper there are assumptions to make this formulation useful.
To sum up, citing the article, the density of the objects of a dataset can be compared with the density of the same number of objects uniformly distributed in the same volume as the data set. If the data set has a uniform distribution, then the neighborhood radius eps , containing k points can be estimated.
Upvotes: 0
Reputation: 77495
Does it come with a scientific reference, or is this just something someone made up himself?
The formula looks like the volume formula of n-balls.
So it may be based on the idea that if the data were uniformly distributed on a cube, and all edges had the same length, this L2-sphere would be expected to have this number of points, not taking boundary effects into account.
However, if your data would look like this, you wouldn't need to run clustering. These assumptions are much too strong to make sense in practical applications.
In particular, if you cannot find a proof or explanation in literature.
I also would suggest to not use this code either. His "OPTICS" implementation was anything, but the OPTICS algorithm... there are better, proper implementations out there. For best results, you will also want to have index support.
Upvotes: 1