tousif
tousif

Reputation: 103

what causes mapreduce job to create only one map for 100000 rows in hbase

I would like to know what configuration causes mapreduce to have only one map while input split of 10000 and lines per map of 1000 are set in job configuration.

Its a 2 node cluster and i tried scan with startRow and endRow.

I want to have atleast 2 maps, one on each machine.

Upvotes: 0

Views: 315

Answers (2)

tousif
tousif

Reputation: 103

Its a row key issue. Row key is composed of same prefix and are stored in only one RS.

Upvotes: 0

David
David

Reputation: 3261

M/R tasks on HBase tables are split, by default, on region boundaries. If you only have one region for 10K rows, you will only get one mapper.

If you only have one region, then you can simply split the regions in your table and have 2 regions, and thus 2 mappers.

Upvotes: 1

Related Questions