Codehunter
Codehunter

Reputation: 69

Serializable isolation violation on table error

I'm new to redshift and facing 'D': 'Serializable isolation violation on table error. This is occurring during the data archive process time frame(mid night load) and impacting the regular batch job data load for same table.

I've written below the archive load process. Is there any best way to avoid such error in my below script?

Can you help? Thanks in advance.

INSERT INTO abc.data_arch
    (  col1,col2......
)

With cte_365days_older
AS
(
    Select * from  abc.abc_data_365 
    WHERE abc_ts < dateadd(day, -365, current_date)
    --LIMIT 1
)
SELECT
    col1,col2...... 
FROM cte_365days_older src 
WHERE not exists (select 1 from abc.data_arch trgt
                    where src.abc_hkey = trgt.abc_hkey                          
                 )
order by col1,col2
  ;
 DELETE 
    from  abc.abc_data_365 
    WHERE abc_ts < dateadd(day, -365, current_date);
    COMMIT;
    VACUUM DELETE ONLY abc.abc_data_365;
    COMMIT;

Error:-

'D': 'Serializable isolation violation on table - 342561, transactions forming the cycle are: 2234341, 2034548 (pid:3235)'

Upvotes: 0

Views: 297

Answers (1)

53epo
53epo

Reputation: 909

OK, with the regular batch job data running at the same time, it looks like the conflict is between the transaction which is doing the INSERT, and the VACUUM command running in parallel in another session (as per Max's answer).

Regarding the data architecture, DELETE is a relatively expensive operation, so to avoid the error you might consider an alternative approach:

  • Create a VIEW on your abc_data_365 table, which only contains the most recent 365 days of data.
  • Run the DELETE less frequently (maybe monthly), and do it immediately after the INSERT has finished for that day

Then let your end-users SELECT from the VIEW, not the base table.

  • End-users will probably see almost no performance degradation (if you run it monthly then at worst a SELECT will be scanning 1/12 of the table extra)
  • You'll be asking Redshift to do the expensive DELETE far less frequently

Something like this:

CREATE VIEW v_abc_data_365 AS
SELECT * FROM abc_data_365
WHERE abc_ts > DATEADD(DAY, -365, current_date);

Upvotes: 0

Related Questions