DBSCAN algorithms in rapidminer and scikit-learn

I am trying to find a clustering algorithm to cluster nominal data with python. For that purpose I tried DBSCAN algorithm with RapidMiner and it worked with nominal data. But when I try same dataset with DBSCAN algorithm which is provided by scikit-learn it gave error that says function could not convert string to float.

Are DBSCANs in rapidminer and scikit-learn different and how can I solve that problem? Also if you tell me another clustering algorithm that works with nominal data it would be great?

Upvotes: -1

Answers (2)

Has QUIT--Anony-Mousse

Reputation: 77495

SciPy defaults to Euclidean distance (metric='euclidean'), which is not defined for nominal data.

You need to specify your distance measure!

Upvotes: 2

Andrew Chisholm

Reputation: 6567

RapidMiner implements various distance measures including Nominal Distance. This is used by DBSCAN and other algorithms.

The distance between two examples is zero if the values of the attributes are identical and 1 otherwise. In other words "Raspberry" is a distance of 1 away from "Apple" and from "Computer". In addition "Apple" is one away from "Raspberry" and "Computer" and so on.

Upvotes: 1

DBSCAN algorithms in rapidminer and scikit-learn

Answers (2)

Related Questions