Reputation: 10139
Wondering if there is an open source implementation for Hadoop distributed version of K-Means? Ask for Hadoop since data is big which cannot be hold into a single box.
thanks in advance, Lin
Upvotes: 0
Views: 127
Reputation: 558
You can use spark for this. Spark implements KMeans. Spark uses RDD (Resilient Distributed Dataset). Your data are distributed on your cluster and each node process closest data.
Spark's performances can be better than Mahout because some of interemediate process are not written on HDFS.
Upvotes: 3
Reputation: 16037
Yes, there is, Mahaout has several k-means implementation e.g.: mahout.apache.org/users/clustering/k-means-clustering.html
Upvotes: 1