Lin Ma
Lin Ma

Reputation: 10139

Hadoop distributed version of K-Means?

Wondering if there is an open source implementation for Hadoop distributed version of K-Means? Ask for Hadoop since data is big which cannot be hold into a single box.

thanks in advance, Lin

Upvotes: 0

Views: 127

Answers (2)

Azwaw
Azwaw

Reputation: 558

You can use spark for this. Spark implements KMeans. Spark uses RDD (Resilient Distributed Dataset). Your data are distributed on your cluster and each node process closest data.

Spark's performances can be better than Mahout because some of interemediate process are not written on HDFS.

Upvotes: 3

bpgergo
bpgergo

Reputation: 16037

Yes, there is, Mahaout has several k-means implementation e.g.: mahout.apache.org/users/clustering/k-means-clustering.html

Upvotes: 1

Related Questions