Reputation: 103
I would like to know what configuration causes mapreduce to have only one map while input split of 10000 and lines per map of 1000 are set in job configuration.
Its a 2 node cluster and i tried scan with startRow and endRow.
I want to have atleast 2 maps, one on each machine.
Upvotes: 0
Views: 315
Reputation: 103
Its a row key issue. Row key is composed of same prefix and are stored in only one RS.
Upvotes: 0
Reputation: 3261
M/R tasks on HBase tables are split, by default, on region boundaries. If you only have one region for 10K rows, you will only get one mapper.
If you only have one region, then you can simply split the regions in your table and have 2 regions, and thus 2 mappers.
Upvotes: 1