How can I do KMeans clustering in python for 8 columns in a data-frame of 14 columns?

Question

I am trying to do clustering for the data-frame given to me. It has 14 columns. How to do clustering for 8 of those?

Below is the code that I found and followed.

Elbow method:

Elbow_method

Visualization

# K-Means Clustering

# importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# importing tha customer Expenses Invoices dataset with pandas
dataset=pd.read_csv('Expense_Invoice.csv')
X=dataset.iloc[: , [3,2]].values

# Using the elbow method to find  the optimal number of clusters
from sklearn.cluster import KMeans
wcss = []
for i in range(1, 11):
  kmeans=KMeans(n_clusters=i, init='k-means++', max_iter= 300, n_init= 10, random_state= 0)
  kmeans.fit(X)
  wcss.append(kmeans.inertia_)
plt.plot(range(1, 11),wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters K')
plt.ylabel('Average Within-Cluster distance to Centroid (WCSS)')  
plt.show()

# Applying k-means to the mall dataset
kmeans=KMeans(n_clusters=3, init='k-means++', max_iter= 300, n_init= 10, random_state= 0)
y_kmeans=kmeans.fit_predict(X)

# Visualizing the clusters
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label='Careful(c1)')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 100, c = 'green', label='Standard(c2)')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 100, c = 'blue', label='Target(c3)')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 250, c = 'yellow', 
            label='Centroids')
plt.title('Clusters of customer Invoices & Expenses')
plt.xlabel('Total Invoices ')
plt.ylabel('Total Expenses')
plt.legend()
plt.show()

This works perfectly but this is only for two columns(variables), i want to have it for 8 column. But I could not understand how?

How can I do KMeans clustering in python for 8 columns in a data-frame of 14 columns?

Answers (1)

Related Questions