Andrew Lam
Andrew Lam

Reputation: 1391

Redshift: How to fix serializable isolation violation(1023) caused by concurrent MERGE operations?

My use case is to extract, transform and load data incrementally and in real time from x number of Lambda functions. I expect multiple Lambda functions to be running concurrently and Redshift to stay alive for read queries.

Since Redshift doesn't enforce primary key(s) constraints, I'm using aws documentation Merge examples - Example of a merge that replaces existing rows to enforce unique rows. This method works fine when there is only 1 instance of lambda function running.

-- Start a new transaction
begin transaction;

-- Delete any rows from SALES that exist in STAGESALES, because they are updates
-- The join includes a redundant predicate to collocate on the distribution key 
-- A filter on saletime enables a range-restricted scan on SALES

delete from sales
using stagesales
where sales.salesid = stagesales.salesid
and sales.listid = stagesales.listid
and sales.saletime > '2008-11-30';

-- Insert all the rows from the staging table into the target table
insert into sales
select * from stagesales;

-- End transaction and commit
end transaction;

-- Drop the staging table
drop table stagesales;

But as soon as > 1 lambda functions are running concurrently and accessing the same table, I'll receive:

"ERROR: 1023 DETAIL: Serializable isolation violation on table in Redshift" when performing operations in a transaction concurrently with another session. 

How should I modify this example to allow it to run in a concurrent environment?

Upvotes: 1

Views: 2946

Answers (3)

Marco Ullasci
Marco Ullasci

Reputation: 1

A 1023 is a retriable error. If it's happening only from time to time you can consider capturing it in your lambda function and then just submit the query again.

Upvotes: 0

ALOK KUMAR SINGH
ALOK KUMAR SINGH

Reputation: 26

did you try locking table in each code as this will not allow other transaction to modify the data? You can all have separete staging table for different lambdas and have a merge job running parally which combine data from them and merge to final table.

Upvotes: 0

Nathan Griffiths
Nathan Griffiths

Reputation: 12756

The issue you are running into is that you have multiple lambda functions executing DML on the same table concurrently. Redshift doesn't support concurrent transactions that are not serializable, i.e. that try and modify the same data at the same time. In that scenario Redshift will abort one or more of the transactions to ensure that all DML that gets executed is serializable.

Your current design will not work properly when scaled to more than one lambda function due to these restrictions in the way Redshift works. You will need to devise a method of managing the lambda function(s) such that there are not conflicting DML statements being run concurrently on the same table. It's not clear why you are using multiple lambda functions to do this so I can't comment on what an alternative would look like.

Upvotes: 3

Related Questions