Shilpa
Shilpa

Reputation: 578

increase efficiency of sqoop export from hdfs

I am trying to export data using sqoop from files stored in hdfs to vertica. For around 10k's of data the files get loaded within a few minutes. But when I try to run crores of data, it is loading around .5% within 15 mins or so. I have tried to increase the number of mappers, but they are not serving any purpose to improve efficienct. Even setting the chunk size to increase the number the mappers, does not increase the number.

Please help.

Thanks!

Upvotes: 1

Views: 1097

Answers (3)

Tagar
Tagar

Reputation: 14891

Is this a "wide" dataset? It might be a sqoop bug https://issues.apache.org/jira/browse/SQOOP-2920 if number of columns is very high (in hundreds), sqoop starts choking (very high on cpu). When number of fields is small, it's usually other way around - when sqoop is bored and rdbms systems can't keep up.

Upvotes: 0

Sachin Janani
Sachin Janani

Reputation: 1319

As you are using Batch export try increasing the records per transaction and records per statement parameter using the following properties:

sqoop.export.records.per.statement : property will aggregate multiple rows inside one single insert statement.

sqoop.export.records.per.transaction: how many insert statements will be issued per transaction

I hope these will surely solves the issue.

Upvotes: 1

technotring
technotring

Reputation: 197

Most MPP/RDBMS have sqoop connectors to exploit the parallelism and increase efficiency in transfer of data between HDFS and MPP/RDBMS. However it seems the vertica has taken this approach: http://www.vertica.com/2012/07/05/teaching-the-elephant-new-tricks/ https://github.com/vertica/Vertica-Hadoop-Connector

Upvotes: 0

Related Questions