Reputation: 1209
I have configured a 3-nodes-cluster to run wordcount mapreduce program. I am using a book, whose size is 659 kb (http://www.gutenberg.org/ebooks/20417) as the test data. Interestingly, in the web UI of that Job, only 1 map, 1 reduce and 1 node is involved. I am wondering if this is because the data size is too small. If yes, could I set manually to split the data into different maps on multi nodes?
Thanks, Allen
Upvotes: 1
Views: 1579
Reputation: 692
The default block size is 64 MB. So yes, the framework does assign only one task of each kind because your input data is smaller.
1) You can either give input data that are more than 64 MB and see what happens.
2) Change the value of mapred.max.split.size
which is specific for the mapreduce jobs
(in mapred-site.xml or running the job with the -D mapred.max-split.size=noOfBytes
)
or
3) Change the value of dfs.block.size
which has a more global scope and applies for all the HDFS. (in hdfs-site.xml)
Don't forget to restart your cluster to apply changes in case you are modifying the conf files.
Upvotes: 2