Reputation:
I am looking for the utility of the precompute_distances attribute:
class sklearn.cluster.KMeans(n_clusters=8, init=’k-means++’, n_init=10,
max_iter=300, tol=0.0001, precompute_distances=’auto’, verbose=0,
random_state=None, copy_x=True, n_jobs=1, algorithm=’auto’)
Which distances it precomputes?
Upvotes: 2
Views: 3000
Reputation: 1629
For each kmeans iteration, we need to find the closest cluster to each sample to perform labeling. If pre_compute == True
, this is done via metrics.pairwise_distances_argmin_min()
. If pre_compute == False
, it is done via cluster._k_means._assign_labels_array()
The first method uses matrix operations, while the latter computes pairwise distances one pair at a time. That's why precompute = True
will be faster but will use more memory.
These minimum distances can not be cached between iterations because the kmeans centers will be changing .
Upvotes: 6