Reputation: 1393
I am trying to find a clustering algorithm to cluster nominal data with python. For that purpose I tried DBSCAN algorithm with RapidMiner and it worked with nominal data. But when I try same dataset with DBSCAN algorithm which is provided by scikit-learn it gave error that says function could not convert string to float.
Are DBSCANs in rapidminer and scikit-learn different and how can I solve that problem? Also if you tell me another clustering algorithm that works with nominal data it would be great?
Upvotes: -1
Views: 1069
Reputation: 77454
SciPy defaults to Euclidean distance (metric='euclidean'
), which is not defined for nominal data.
You need to specify your distance measure!
Upvotes: 2
Reputation: 6567
RapidMiner implements various distance measures including Nominal Distance. This is used by DBSCAN and other algorithms.
The distance between two examples is zero if the values of the attributes are identical and 1 otherwise. In other words "Raspberry" is a distance of 1 away from "Apple" and from "Computer". In addition "Apple" is one away from "Raspberry" and "Computer" and so on.
Upvotes: 1