Reputation: 578
I am trying to export data using sqoop from files stored in hdfs to vertica. For around 10k's of data the files get loaded within a few minutes. But when I try to run crores of data, it is loading around .5% within 15 mins or so. I have tried to increase the number of mappers, but they are not serving any purpose to improve efficienct. Even setting the chunk size to increase the number the mappers, does not increase the number.
Please help.
Thanks!
Upvotes: 1
Views: 1097
Reputation: 14891
Is this a "wide" dataset? It might be a sqoop bug https://issues.apache.org/jira/browse/SQOOP-2920 if number of columns is very high (in hundreds), sqoop starts choking (very high on cpu). When number of fields is small, it's usually other way around - when sqoop is bored and rdbms systems can't keep up.
Upvotes: 0
Reputation: 1319
As you are using Batch export try increasing the records per transaction and records per statement parameter using the following properties:
sqoop.export.records.per.statement :
property will aggregate multiple rows inside one single insert statement.
sqoop.export.records.per.transaction:
how many insert statements will be issued per transaction
I hope these will surely solves the issue.
Upvotes: 1
Reputation: 197
Most MPP/RDBMS have sqoop connectors to exploit the parallelism and increase efficiency in transfer of data between HDFS and MPP/RDBMS. However it seems the vertica has taken this approach: http://www.vertica.com/2012/07/05/teaching-the-elephant-new-tricks/ https://github.com/vertica/Vertica-Hadoop-Connector
Upvotes: 0