sundarls
sundarls

Reputation: 17

How to simulate "Serializable isolation violation" error in Amazon Redshift for testing stored procedures?

I am working with Amazon Redshift and I am encountering an intermittent error in my stored procedure. The error message is:

DETAIL: Serializable isolation violation on table - 101841, transactions forming the cycle are

The error occurs when performing an UPDATE statement on the table incremental_table inside the stored procedure. We are using Serializable isolation level for transaction handling.

I need to simulate this error in a controlled environment to troubleshoot and reproduce the issue. Could anyone guide me on how to simulate this "Serializable isolation violation" error in Amazon Redshift? Specifically, I would like to:

Understand the conditions under which Redshift throws this error.

Know if there is a way to manually trigger such a violation for testing.

Get any insight on best practices for handling such errors.

Thanks Sundar

Upvotes: 0

Views: 24

Answers (1)

Bill Weiner
Bill Weiner

Reputation: 11082

Sure thing. As far as understanding this error the AWS Redshift pages do a good job - https://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html and https://aws.amazon.com/premiumsupport/knowledge-center/redshift-serializable-isolation/

If you are not familiar with MVCC coherency then a quick read of https://en.wikipedia.org/wiki/Multiversion_concurrency_control might help.

Let me try to simplify this down a bit. It is easy to miss the forest for the trees.

  1. 2 transaction are concurrently in flight (after BEGIN and before COMMIT or END). Both are using their own "version" of the database state which matches the database state at the time each transaction began. occurred.
  2. Each transaction modifies a table that is part of the other transaction's initial state for tables that they are using as source for their queries.

That's it. Redshift doesn't "know" that the changes that the other transaction is making is material to the results this transaction is making. Just that it COULD be material. Since it COULD be material then the serialization hazard exists and one transaction is aborted to prevent the possibility of indeterminant results.

The error occurs when this "A depends on B in one transaction while B depends on A in another" situation occurs AND the second transaction tries to COMMIT its results. Up until that point there is no conflict as the second transaction could ROLLBACK.

Hopefully this is clear, if not ask for clarification.

You can (or at least could some years back) create these circumstances using a workbench with 2 sessions (connections) open.

  • First make sure that AUTOCOMMIT is off for both sessions
  • You will need 2 test tables with data in each, fully committed to the database
  • In session A BEGIN and transaction and select from table A. In session B BEGIN a transaction and select from table B. This will make the transactions "linked" to the starting states of tables A and B respectively.
  • Now in session A INSERT into table B FROM table A. This will make the table B state in session A different than in session B and its values dependent on table A.
  • Then in session B INSERT into table A FROM table B. This will make
    the table A state in session B different than in session A and its values dependent on table B.
  • COMMIT results in session A.
  • COMMIT results in session B - boom!

This should work but I think the Redshift team has been working on reducing the number of these errors so they may have made it more difficult. Let me know.

These conditions can be set up in a bench since we can leave a transaction open in one session while we do things in the other. In a real working database these transactions will open and close quickly and getting the timing to line up is difficult. You can use code to set this up but to reproduce reliably you will need to control the timing of events in 2 sessions.

As for addressing these errors in your ETL solution there are a few things you should do.

  • Don't have independent parallel ETL processes concurrently working on the database at the same time. Or at least make sure that these parallel processes don't share data domains within the database.
  • Have dependencies within your ETL so that updates that may conflict don't run at the same time.
  • Don't have super long transactions. COMMIT results when data changes and the database would be in a consistent state.
  • You can use explicit locks to force the serialization you want but these can impact database performance and are confusing. I've seem more people create problems with explicit locks than solve them so be careful.
  • There are some switches in the database to change how serialization is performed but these can have database performance impacts. So just like with explicit locks I'd recommend care and not making these your choices of last resort.

I'll just say it again - the best answer is to review your processes that change table contents and order these processes to remove the possibility of conflict. Just because you "resolve" your serialization errors doesn't mean that your database is in a self-consistent state.

Upvotes: 0

Related Questions