Aman Verma
Aman Verma

Reputation: 1

I am getting an error in clustering with the KPrototyping clustering agorithm for clustering categroical and numerical data both

So I am using KProtoypes clustering algorithm to cluster through the mixed variables that include categorical as well as the numerical columns. And I am getting an error:

'>' not supported between instances of 'str' and 'int'

And also all the features have consistent data types. I have shared a code snippet I used to run along with the screenshot of the data. I just want to cluster this type of data, any suggestion for clustering would be accepted.

Also, my data is as follows: excel view of the data

And The information about the data is as follows : Info of the data

And this is the code i wrote for the k_prototype algorithm :

kp = KPrototypes(n_clusters=3, init='random', verbose=True)
kp.fit(X_dummy,categorical=[7,8,9,10,11,12,13])

Also, Do check the categorical list that I have passed in kp.fit

Upvotes: 0

Views: 563

Answers (2)

mnm
mnm

Reputation: 2022

This message suggests, that you try to compare a string object (str) with an integer (int). You need to clean the data before applying an algorithm. Garbage In is Garbage Out.

Upvotes: 0

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77474

Column L in your table contains strings and numbers (0).

That is probably causing the error.

Upvotes: 0

Related Questions