Reputation: 1
I am trying to perform k-means clustering using mahout on a 300MB dataset containing only numerical values. But I am running out of memory in the k-means command after the second iteration. Why does the size increase after every iteration? How can I resolve this issue?
Upvotes: 0
Views: 182
Reputation: 77454
Don't use Mahout for small data sets. Just don't.
300 MB easily fits into main memory of any modern computer. An in-memory implementation (maybe try ELKI) will easily outperform Mahout, because it does not have the overhead of Map Reduce.
Hadoop is not a one-size-fits-all solution. It is the super-size solution, and you don't have supersize data.
Any chance that you aren't even using a real cluster, but virtual machines? You might have too small diskspace or memory assigned, or your cluster is not well configured.
Upvotes: 1