KMeans Clustering only using specific Csv column

Question

Following a tutorial I am learning on how to use Kmeans.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use("ggplot")
from sklearn.cluster import KMeans



X = np.array([[1, 2],
              [5, 8],
              [1.5, 1.8],
              [8, 8],
              [1, 0.6],
              [9, 11]])


kmeans = KMeans(n_clusters=2 )
kmeans.fit(X)

centroids = kmeans.cluster_centers_
labels = kmeans.labels_

print(centroids)
print(labels)

colors = ["g.","r.","c.","y."]

for i in range(len(X)):
    print("coordinate:",X[i], "label:", labels[i])
    plt.plot(X[i][0], X[i][1], colors[labels[i]], markersize = 10)


plt.scatter(centroids[:, 0],centroids[:, 1], marker = "x", s=150, linewidths = 5, zorder = 10)

plt.show()

I want to read a csv file and then instead of the array which is used above, have one of the dataframe columns.

I tried the following but I did't work

df=pd.read_csv("Output.csv",encoding='latin1')
X=pd.DataFrame([['Column_1']])

I got the following error

ValueError: could not convert string to float: 'Column_1'

This is how my output looks when I use df.head

    x    id  ... Column_name v      Column_1
0  25  0001  ...         NaN             854
1  28  0002  ...         NaN            85,4
2  29  0003  ...         NaN            1524
3  32  NaN   ...         NaN               0
4  85  0004  ...         NaN               0

KMeans Clustering only using specific Csv column

Answers (1)

Related Questions