Mahout k-means clustering command : facing Heap Space Issues

I am trying to perform k-means clustering using mahout on a 300MB dataset containing only numerical values. But I am running out of memory in the k-means command after the second iteration. Why does the size increase after every iteration? How can I resolve this issue?

Upvotes: 0

Answers (1)

Has QUIT--Anony-Mousse

Reputation: 77495

Don't use Mahout for small data sets. Just don't.

300 MB easily fits into main memory of any modern computer. An in-memory implementation (maybe try ELKI) will easily outperform Mahout, because it does not have the overhead of Map Reduce.

Hadoop is not a one-size-fits-all solution. It is the super-size solution, and you don't have supersize data.

Any chance that you aren't even using a real cluster, but virtual machines? You might have too small diskspace or memory assigned, or your cluster is not well configured.

Upvotes: 1

Mahout k-means clustering command : facing Heap Space Issues

Answers (1)

Related Questions