Justin Symon
Justin Symon

Reputation: 1

DBSCAN Input explaination

What exactly does the DBSCAN algorithm take as input?

Why do I have different output in weka and in a coded algorithm?

In a coded algorithm, it only takes 2 inputs while in weka it could take 3.

Can someone help me understand the algorithm please?

Upvotes: 0

Views: 1957

Answers (2)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

With "2 inputs", do you mean two variables (dimensions), by chance?

If your code only works with 2 dimensions, read up on distance functions. Most distance functions can be computed for more than two dimensions easily... for example, Euclidean distance is defined as

sqrt(pow(x_i-y_i, 2).sum())

which works well when you loop i from 1 to n > 2, too.

DBSCAN has 2 obvious and one hidden parameter: minPts, and epsilon are the obvious ones, and the hidden parameter is the distance function. Which has by far the largest effect on the results, and requires data understanding to choose. There is no rule of thumb to choose this parameter, unfortunately. It really depends on your data.

I'm not surprised if you get different results in the Weka implementation. It contains implicit data normalization, which tends to produce unexpected results... The best implementation of DBSCAN can IMHO be found in ELKI. If you enable data indexes, it is really fast.

Upvotes: 0

qqilihq
qqilihq

Reputation: 11454

The algorithm is described pretty well in the Wikipedia. The configuration input is:

  • eps: Maximum distance for the epsilon neighborhood.
  • minPts: The number of points which are required to form a region.

Briefly: A new cluster is created, if the epsilon neighborhood around a data point contains at least minPts. Further input:

  • the dataset (obviously)
  • (maybe) a distance function, if the algorithm allows parametrizing in that regard

Upvotes: 1

Related Questions