K-Means taking a long time

Question

I'm using k-means for my project for the first time. my dataset has more than 400,000 rows and 11 columns, I run the k-means for k= 3, 5, 7, 9, and 10. it took more than 65 minutes and still no output. is that normal? it's my first time so I'm not sure what to expect

I'm using python, visual studio

sse = []
silhouette_scores = []
k_values = [3, 5, 7, 9]

for k in k_values:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=1, init='k-means++')
    kmeans.fit(x_pca)
    sse.append(kmeans.inertia_)
    silhouette_scores.append(silhouette_score(x_pca, kmeans.labels_))

# the elbow method
plt.figure(figsize=(10, 6))
plt.plot(k_values, sse, marker='o')
plt.title('Elbow Method for Optimal k')
plt.xlabel('Number of clusters (k)')
plt.ylabel('Sum of Squared Errors (SSE)')
plt.show()

# silhouette scores
plt.figure(figsize=(10, 6))
plt.plot(k_values, silhouette_scores, marker='o')
plt.title('Silhouette Score for Optimal k')
plt.xlabel('Number of clusters (k)')
plt.ylabel('Silhouette Score')
plt.show()

K-Means taking a long time

Answers (1)

Analysis

What to do?

Related Questions