Reputation: 823
I am testing the machine learning tools in Vertica. I understand how the KMEANS
work since it just devides the data into clusters. However I do not understand how the APPLY_KMEANS
works on new data.
It looks to me like it acts more like a classification method. Since it classifies new Data in the existing clusters. So what algorithm is used (K nearest neighbor)? Its not very clear from the documentation.
Upvotes: 0
Views: 205
Reputation: 2109
k-means is a clustering algorithm (not classification!) that iterates over 2 steps:
When you build your k-means model, you first initialize centroids (different strategy, can be random initialization), then you iterate until your clustering is ok (your error is below a given threshold).
What defines your model is actually your centroids.
When using APPLY_KMEANS
you will run an assignment step using data from your query and centroids from your model. Points will then be assigned to clusters depending on their distance with respect to centroids.
Hope it helps
pltrdy
Note about Clustering vs Classification:
We can be tempted to think that clustering is a kind of classification. Still, classification must only refer to supervised learning while clustering corresponds to unsupervised learning. Thus, don't do it :)
Upvotes: 1