VikasG
VikasG

Reputation: 579

Execution time increases with increasing map jobs

I am trying to migrate some data from MySQL to HBase using sqoop import. Here is the command I'm using:

sqoop import --connect jdbc:mysql://hostname/database --username username -P
--query 'SELECT * FROM logs WHERE $CONDITIONS' --split-by log_id -m 4
--hbase-table logs --column-family cf --hbase-create-table

The issue is that the execution time increases when no of maps are increased. Since the parallel processing is done with the increase of mappers, so ideally the execution time should actually decrease.

Here is the pattern

No. of Maps     Time(in sec)
    1               16
    2               20
    4               29
    8               51
    10              55
    16              82
    25              122


From what can be seen from above, it takes least time when only one mapper is there. Any idea what could be the reason? Any help will be highly appreciated.
My cluster consists of a namenode and two datanodes.

Upvotes: 1

Views: 210

Answers (1)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

Probably the load on mySQL when running multiple queries simultaneously. Also it seems from the total running time (16sec) that you are importing very small data so adding more maps increases overhead but each map only handles a small data segment so the overhead isn't offset. Lastly you didn't say much about your cluster (which I am guessing is a small test one), so if you are allocating more mappers than slots mappers will wait until there are free slots increasing the time even more

Upvotes: 1

Related Questions