Words clustering

Question

I'm trying to cluster some words (let's take car brands). In order to do that I can't use k-means or k-medoids so I've tried to use Affinity Propagation from Sklearn. And I'm using it with levenshtein from the distance lib or damerau_levenshtein_distance from the pyxdameraulevenshtein lib.

Example here : https://stats.stackexchange.com/questions/123060/clustering-a-long-list-of-strings-words-into-similarity-groups

However, these metrics are not exactly the ones I need. For example, MERCEDES-BENZ and MERCEDES have a 5 distance, the same as VOLVO and FIAT. Do you guys know some metrics which would give a higher similarity score between MERCEDES-BENZ and MERCEDES than VOLVO and FIAT.

Thanks, Djokx

Words clustering

Answers (1)

Related Questions