Tim
Tim

Reputation: 105

Safely using TRUNCATE with active AFTER INSERT triggers in PostgreSQL?

TLDR: In PostgreSQL what would be a safe way to clean the staging table or use TRUNCATE, ensuring the AFTER INSERT trigger is finished processing all rows before TRUNCATE is executed?

Can I rely on the TRUNCATE statement to wait for any potentially active triggers to finish processing all batch inserted rows before deleting them?

Details: I have a staging_table where I insert data continuously 24/7 using the PostgreSQL COPY tool (timescaledb-parallel-copy), with 32 parallel workers up to 120k ROW/s. This data is transferred from the staging_table to the final time_series_table using an AFTER INSERT trigger that performs an UPSERT operation for each row. I clean the staging_table running the TRUNCATE statement as a scheduled job every night.

I understand the TRUNCATE statement uses an ACCESS EXCLUSIVE lock, meaning that this mode guarantees that the holder is the only transaction accessing the table in any way. And waits for any current active processes to finish before executing.

For example, when inserting data into the table in 10k row batches of 120k ROW/s, I'm not exactly sure how the AFTER INSERT trigger that performs an UPSERT operation behaves internally as it processes rows one by one. So, I'm concerned about whether the TRUNCATE statement might execute before the AFTER INSERT trigger has processed all the new rows.

My understanding is that when I insert for example a batch of 10k rows, it is treated as a single transaction. Therefore, the AFTER INSERT FOR EACH ROW trigger, which executes the UPSERT operation as one transaction one row at a time, meaning that if any single row in the batch encounters an issue, the entire batch will fail. Also, I assume the entire batch must be fully processed by the trigger before a TRUNCATE operation gets executed.

Simplified AFTER INSERT UPSERT example:

CREATE OR REPLACE FUNCTION upsert_example_function()
RETURNS TRIGGER AS $$
BEGIN
    -- Proceed with upsert into a time_series target table
    INSERT INTO time_series_table (id, time, value, status)
    VALUES (NEW.id, NEW.time, NEW.value, NEW.status)
    ON CONFLICT (id, time)
    DO UPDATE SET
        value = EXCLUDED.value,
        status = EXCLUDED.status
    WHERE (
        -- Update unless status_code indicates a locked state
        time_series_table.status <> 'locked'
        )
        -- AND if the new data differs in value or status
        AND (
        time_series_table.value IS DISTINCT FROM EXCLUDED.value OR
        time_series_table.status IS DISTINCT FROM EXCLUDED.status
        );

    -- Return the row to indicate successful processing
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_upsert_example
AFTER INSERT ON staging_table
FOR EACH ROW EXECUTE FUNCTION upsert_example_function();

Upvotes: 0

Views: 118

Answers (0)

Related Questions