Gaurav Shewale
Gaurav Shewale

Reputation: 33

Insert bulk data into big-query without keeping it in streaming buffer

My motive here is as follow:

  1. Insert bulk records into big-query every half an hour
  2. Delete the record if the exists

Those records are transactions which change their statuses from: pending, success, fail and expire.

BigQuery does not allow me to delete the rows that are inserted just half an hour ago as they are still in the streaming buffer.

can anyone suggest me some workaround as i am getting some duplicate rows in my table.

Upvotes: 1

Views: 1619

Answers (1)

Elliott Brossard
Elliott Brossard

Reputation: 33705

A better course of action would be to:

  • Perform periodic loads into a staging table (loading is a free operation)
  • After the load completes, execute a MERGE statement.

You would want something like this:

MERGE dataset.TransactionTable dt
USING dataset.StagingTransactionTable st
ON dt.tx_id = st.tx_id
WHEN MATCHED THEN
UPDATE dt.status = st.status
WHEN NOT MATCHED THEN
INSERT (tx_id, status) VALUES (st.tx_id, st.status)

Upvotes: 2

Related Questions