Lleims
Lleims

Reputation: 1353

Save kmeans model to future same data clustering

I am currently working on clustering a data set. My question is, is there any way to save the result of the groups so that in the future I can work with new data and know to which group they belong according to the kmeans "model" I made?

I have learned to work with Kmeans, it is very interesting, but when I want to know what a new data belongs to, right now I repeat the whole process of analysis. And what I would like is according to the old data (we could call it training data) can I define the group of a new data?

This is my code right now.

n_clusters = 15
kmeans = KMeans(n_clusters = n_clusters, init = 'k-means++', max_iter = 3000, n_init = 100, random_state = 0)
y_kmeans = kmeans.fit_predict(data)

data_df['k-means'] = y_kmeans

If I plot my current results, I already have the entire data spectrum occupied. Therefore, any new data must belong to one of the current groups.

#Visualising the clusters
colors = ['blue', 'orange', 'green', 'red', 'yellow', 'cyan', 'brown', 'cadetblue', 'gray',\
          'salmon', 'olive', 'deeppink', 'pink', 'gold', 'lime']
for i in range(n_clusters):
    plt.scatter(data[y_kmeans == i, 0], data[y_kmeans == i, 1], color=colors[i])

#Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], label = 'Centroids')

plt.legend()

enter image description here

Obviously with new data, you will also re-study the data for variations.

Thank you very much.

Upvotes: 0

Views: 1587

Answers (1)

ypnos
ypnos

Reputation: 52317

You can simply keep the cluster centers and assign each new data point to the nearest cluster (ie., minimize the Euclidean distance).

This is what the prediction step in k-means does.

The cluster centers are available as y_kmeans.cluster_centers_.

Upvotes: 1

Related Questions