Reputation: 323
I have a Java application that reads from a group of tables, process data and then inserts the processed data into one table. The data volume is really large and multi threading is inevitable. The problem is happening when multiple threads attempt to insert at the same time, which causes a write hold and an insert hold. After that all the threads wait for the table to be available until the connection times out.
The records being inserted do not have any foreign keys pointing to any table
The records being inserted are completely independent from each other
The table does not have an incremental ID of any sort
No other application is using any of the tables involved in this application
I am using Java prepared statement to do the insert. Each calls an insert method which opens a connection for each thread separately.
Is there anything I'm doing wrong or is it a compatibility issue with Redshift?
Upvotes: 3
Views: 3142
Reputation: 2985
Doing large numbers of small inserts into a Redshift table with individual connections is definitely not what you want to do. The Redshift WLM query queues are not suitable for many small concurrent operations. By default only 5 queries will run concurrently and you want to save those for your actual work queries and not loading. It makes sense when you consider that Redshift is optimized for a small number of long running queries on very large data sets.
For your case I recommend you take a look at using Kinesis Firehose to load your data into Redshift. It is designed for exactly this purpose and you can write to it efficiently from many threads. It will batch the data for you, can give you an opportunity to filter/transform data if needed and have a side effect of backing up to S3 if desired.
Here is a Java example.
Upvotes: 6