theog
theog

Reputation: 2092

amazon redshift concurrent write results in inserted records, causing duplicates

I have been trying to solve a problem where two concurrent updates on the same table are causing additional records to be created/inserted. Never experienced this in any other relational database, and nor would i. So i believe it's potential a quirk in redshifts architecture of distributing queries across multiple nodes, however cannot pinpoint or provide a real world example.

Before these two updates are run, i insert new data into the table. The insert contains a daily snapshot that fills out one day of data, most columns have empty values ready for the updates to populate them.

The updates are run concurrently, which are simple update sql's, updating their respective columns. If run individually i do not see additional records created and no duplication.

The updates operate across the entire table, over 200 million records, however the duplication occurs only in the records that where populated recently(the new data for that days period.

This is kind of a worry, as i would never assume an update would ever create new records, addition to the records created with the first insert.

What is even more bizzare is that the duplicate records hold different data.

I have checked to veryify that no other queries are running beyond the expected, by looking at redshifts query logs (stl_query).

Upvotes: 4

Views: 2334

Answers (1)

Diego
Diego

Reputation: 36146

I find really hard to believe that an update created new values, are you really sure about this? I've been trough complicate situations when It comes to concurrent transactions on the same table, so what I suggest is that you explicitly lock your table with:

lock table <table> in exclusive mode;

before you manipulate it (exclusive mode will allow reads but any write attempt will have to wait)

If you don't and 2 transactions try to update (Inserts are fine, BTW) the same table, you are most likely yo get a "ERROR: 1023 - DETAIL: Serializable isolation violation on table" - or the behavior you are reporting

Upvotes: 1

Related Questions