Sayan
Sayan

Reputation: 57

error while passing data-frame through k-means

Although my data-frame as all the float values everywhere. While passing the data frame through k-means it shows that couldn't convert the string to float.

How to convert nan values if any to float values in the entire data-frame?

Upvotes: 0

Views: 1296

Answers (2)

Tacratis
Tacratis

Reputation: 1055

Based on your code, it would seem that you only instantiated the KMeans but haven't used it. You'll need input data X that is clean (i.e. no strings etc), let's call it X

kmeans = KMeans(n_clusters=4,init='k-means++', max_iter=600, algorithm = 'auto')
clusters = kmeans.fit_predict(X)

now clusters has the cluster number for each sample in X.

(alternatively, you can do the fit(X) and then later predict(X) separately, but ultimately it is the predict that will output the cluster labels that you will need)

If you want to later get clusters on data, you should use kmeans.predict(new_data) rather than fit_predict() so that KMeans uses the learning from X, and applies it to your new_data (or depending on your needs, you might want to retrain it).
Hope this helps.

Finally, you can add another column to your pandas DataFrame by doing:

df['cluster'] = clusters

where 'cluster' is a string for your new column name, you can of course call it whatever you want

Upvotes: 0

Yoshitha Penaganti
Yoshitha Penaganti

Reputation: 464

This would do your job and convert all the columns in string format to categorical codes or use one hot encoding of the variables in these columns.

import numpy as np  
from sklearn.cluster import KMeans
import pandas
df = pandas.read_csv('zipIncome.csv')
print(df)

df[col_name]= df[col_name].astype('category')
df[col_name] = df[col_name].cat.codes
kmeans = KMeans(n_clusters=4,init='k-means++', max_iter=600, algorithm = 'auto').fit(df)
print (kmeans.labels_)
print(kmeans.cluster_centers_)  

Upvotes: 1

Related Questions