Harsh P Waghela
Harsh P Waghela

Reputation: 83

Redshift: Support for concurrent inserts in the same table

I have a lambda code that fires some insert queries to the same Table concurrently through redshift data api.


1. Insert into Table ( select <some analytical logic> from someTable_1)
2. Insert into Table ( select <some analytical logic> from someTable_2)
3. Insert into Table ( select <some analytical logic> from someTable_n)

Considering such queries will be fired concurrently, does Redshift apply a lock to the Table for each insert? Or does it allow parallel insert queries in the same table? I'm asking because postgres allows concurrent inserts.

https://www.postgresql.org/files/developer/concurrency.pdf

Upvotes: 1

Views: 1994

Answers (2)

Harsh P Waghela
Harsh P Waghela

Reputation: 83

Based on the discussion in comments, one can conclude that concurrent inserts into the same table in Redshift are blocking in nature as opposed to postgres. Refer to the docs:- https://docs.aws.amazon.com/redshift/latest/dg/r_Serializable_isolation_example.html

Edit:-

FYI if you are thinking what is the exact information to look for in the above mentioned docs, I am directly pasting it below:-

Concurrent COPY operations into the same table
Transaction 1 copies rows into the LISTING table:

begin;
copy listing from ...;
end;
Transaction 2 starts concurrently in a separate session and attempts to copy more rows into the LISTING table. Transaction 2 must wait until transaction 1 releases the write lock on the LISTING table, then it can proceed.

begin;
[waits]
copy listing from ;
end;
The same behavior would occur if one or both transactions contained an INSERT command instead of a COPY command.

Upvotes: 1

Bill Weiner
Bill Weiner

Reputation: 11032

Both Redshift and Postgres DBs us MVCC - https://en.wikipedia.org/wiki/Multiversion_concurrency_control - so they will likely work the same. There are no write-locks, just serial progression through the commit queue when the commits are seen. I've see no functional problems with this in Redshift so you should be good.

Functionally this is good but Redshift is columnar and Postgres is row-based. This leads to differences in the updating side. Since these INSERTs are likely only adding a small (for Redshift) number of rows and the minimum write size on Redshift is 1MB per column per slice, there is likely to be a lot of unused space in these blocks. If this is done often there will be a lot of wasted space in the table and large need to vacuum. If you can you will want to look at this write pattern to see if more batching of the insert data can be done.

Upvotes: 1

Related Questions