Redshift query queue usage on batch inserts

Question

Do DML batch insert JDBC queries receive that same treatment for query queue dispatching within Redshift as select type queries? In particular, will multiple batch inserts be allocated to the proper WLM queue (as defined by the user/user group) and run concurrently as specified by the queue concurrency level?

It is practically impossible to understand via the Redshift console which queries are executing on which queues and which are executing concurrently, but I have to believe, through our testing, that batch inserts are not executing concurrently but rather serially. Can anyone give me more insight?

Thanks.

Naveen Vijay · Accepted Answer

Batch Inserts aren't capable of running in Parallel in Redshift. Thats precisely the reason why Redshift webinars, documents and articles evangelize about using the COPY command which pulls the data-set from S3 over a delimited file(s) tries to push them as much parallel as possible.

I have tried couple of times to check the performance of INSERTs vs. COPY and the scale difference is pretty massive. COPY command is amazing & blazing fast.

I recommend you to change your data load logic to make use of Delimited file -> S3 -> Redshift using COPY rather than batch inserts.

Extract from AWS Redshift Documentation - [ Using a COPY command to load data ]

We strongly recommend using the COPY command to load large amounts of data. Using individual INSERT statements to populate a table might be prohibitively slow. Alternatively, if your data already exists in other Amazon Redshift database tables, use INSERT INTO ... SELECT or CREATE TABLE AS to improve performance. For information, see INSERT or CREATE TABLE AS.

Redshift query queue usage on batch inserts

Answers (1)

Related Questions