Reputation: 105
TLDR: In PostgreSQL what would be a safe way to clean the staging table or use TRUNCATE
, ensuring the AFTER INSERT
trigger is finished processing all rows before TRUNCATE
is executed?
Can I rely on the TRUNCATE
statement to wait for any potentially active triggers to finish processing all batch inserted rows before deleting them?
Details:
I have a staging_table
where I insert data continuously 24/7 using the PostgreSQL COPY tool (timescaledb-parallel-copy), with 32 parallel workers up to 120k ROW/s. This data is transferred from the staging_table
to the final time_series_table
using an AFTER INSERT
trigger that performs an UPSERT
operation for each row. I clean the staging_table
running the TRUNCATE
statement as a scheduled job every night.
I understand the TRUNCATE
statement uses an ACCESS EXCLUSIVE
lock, meaning that this mode guarantees that the holder is the only transaction accessing the table in any way. And waits for any current active processes to finish before executing.
For example, when inserting data into the table in 10k row batches of 120k ROW/s, I'm not exactly sure how the AFTER INSERT
trigger that performs an UPSERT
operation behaves internally as it processes rows one by one. So, I'm concerned about whether the TRUNCATE
statement might execute before the AFTER INSERT
trigger has processed all the new rows.
My understanding is that when I insert for example a batch of 10k rows, it is treated as a single transaction. Therefore, the AFTER INSERT
FOR EACH ROW
trigger, which executes the UPSERT
operation as one transaction one row at a time, meaning that if any single row in the batch encounters an issue, the entire batch will fail. Also, I assume the entire batch must be fully processed by the trigger before a TRUNCATE operation gets executed.
Simplified AFTER INSERT
UPSERT
example:
CREATE OR REPLACE FUNCTION upsert_example_function()
RETURNS TRIGGER AS $$
BEGIN
-- Proceed with upsert into a time_series target table
INSERT INTO time_series_table (id, time, value, status)
VALUES (NEW.id, NEW.time, NEW.value, NEW.status)
ON CONFLICT (id, time)
DO UPDATE SET
value = EXCLUDED.value,
status = EXCLUDED.status
WHERE (
-- Update unless status_code indicates a locked state
time_series_table.status <> 'locked'
)
-- AND if the new data differs in value or status
AND (
time_series_table.value IS DISTINCT FROM EXCLUDED.value OR
time_series_table.status IS DISTINCT FROM EXCLUDED.status
);
-- Return the row to indicate successful processing
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_upsert_example
AFTER INSERT ON staging_table
FOR EACH ROW EXECUTE FUNCTION upsert_example_function();
Upvotes: 0
Views: 118