assign cluster membership to new data using kmodes

Question

Looking at this code from here:

import numpy as np
from kmodes.kmodes import KModes

# random categorical data
data = np.random.choice(20, (100, 10))

km = KModes(n_clusters=4, init='Huang', n_init=5, verbose=1)
clusters = km.fit_predict(data)

# Print the cluster centroids
print(km.cluster_centroids_)

Does anyone happen to know how to save the "clustering model" and apply it to new data? Or in other words cluster previously unseen data? Thanks.

artemis · Accepted Answer

You can use pickle for this task.

import pickle

with open('cluster_model.pickle', 'wb') as n:
    pickle.dump(km, n)

When you want to use it on new data, simply:

with open('cluster_model.pickle', 'rb') as f:
    km = pickle.load(f)

# If your new data is called "new_data", you can:
new_clusters = km.predict(new_data)

assign cluster membership to new data using kmodes

Answers (1)

Related Questions