Reputation: 705
I am running kmeans on a multinode cluster.The input size is about 100mb and I have modified bin/mahout file like this
.
.
.
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.min.split.size=10MB"
.
.
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=10"
Over each iteration i get
12/09/12 17:05:02 INFO mapred.JobClient: Launched map tasks=1
12/09/12 17:05:02 INFO mapred.JobClient: Launched reduce tasks=6
12/09/12 17:05:02 INFO mapred.JobClient: Data-local map tasks=1
Does this mean that it runs on single node instead of multi node?And if so what do I miss in the configuration?
Upvotes: 1
Views: 572
Reputation: 66876
Surely you want to set the max split size rather than min, if you want more splits. It is still only a suggestion to the cluster.
Upvotes: 3