Reputation: 579
I am trying to migrate some data from MySQL to HBase using sqoop import. Here is the command I'm using:
sqoop import --connect jdbc:mysql://hostname/database --username username -P
--query 'SELECT * FROM logs WHERE $CONDITIONS' --split-by log_id -m 4
--hbase-table logs --column-family cf --hbase-create-table
The issue is that the execution time increases when no of maps are increased. Since the parallel processing is done with the increase of mappers, so ideally the execution time should actually decrease.
Here is the pattern
No. of Maps Time(in sec) 1 16 2 20 4 29 8 51 10 55 16 82 25 122
From what can be seen from above, it takes least time when only one mapper is there. Any idea what could be the reason? Any help will be highly appreciated.
My cluster consists of a namenode and two datanodes.
Upvotes: 1
Views: 210
Reputation: 25909
Probably the load on mySQL when running multiple queries simultaneously. Also it seems from the total running time (16sec) that you are importing very small data so adding more maps increases overhead but each map only handles a small data segment so the overhead isn't offset. Lastly you didn't say much about your cluster (which I am guessing is a small test one), so if you are allocating more mappers than slots mappers will wait until there are free slots increasing the time even more
Upvotes: 1