Reputation: 437
Does anyone know if there is a limit for the value of sqoop.export.records.per.statement for Sqoop batch export job?
I have very large size of data, like 200,000,000 rows of data to export, from Impala to Vertica. I will get [Vertica][VJDBC](5065) ERROR: Too many ROS containers exist for the following projections
if records per statement is set too low, or java.lang.OutOfMemoryError: GC overhead limit exceeded
if records per statement is set too high.
Anyone know how to fix this problem? Thanks!
Upvotes: 2
Views: 1754
Reputation: 7606
I think the limit is that of memory. If you increase the heap it'll let you set the number higher. Try adding -D mapred.child.java.opts=-Xmx1024M
or some larger number than your current settings?
You could try to increase export.statements.per.transaction
and reduce your records per statement. I'm thinking this won't help on the ROS container side because I am thinking each batch of SQL = 1 COPY statement = 1 ROS container. I don't think it'll convert multiple batches of INSERTs into a single COPY, but I don't have a way to test it right now.
You could bypass sqoop and stream the data (You may need to construct the COPY), something like:
impala-shell -k -i server:port -B -q 'select * from mytable' --output_delimiter="|" | vsql -h database_host -U user -w password -c 'copy mytable from stdin direct'
Upvotes: 3