danD
danD

Reputation: 716

Use Snowpipe to load the latest data and delete the previous one

I have a table which get loaded from S3 every time when there is a new file in bucket. And I am using Snowpipe to do so.

However the ask is to refresh the table in every load. To accomplish that, My thought process is below.

  1. Create a pipe on table t1 to copy from S3.
  2. Create a Stream on table t1.
  3. Create a task to run every 5 min and condition when stream has data. The Task statement will be to delete the record from table where load_date of stream is not equal to the load_date of Table. (Using stream to implement DML operation on Stream so that Stream get empty)

So Basically using the Self stream to delete the data from the table. However my issue is what will happen when there is multiple load on the same day. And this approach does not look very professional. Is there a better way.

Upvotes: 1

Views: 1221

Answers (1)

Dean Flinter
Dean Flinter

Reputation: 674

I would create a new target table for the stream data and merge into this new table on every run. If you really need to delete data from t1 then you could setup a child task that deletes data from t1 based on what you have in t2 (after you have merged)

However, the stream will record these delete operations. Depending on how your load works you could created an append only stream or when ingesting the stream, make sure to use the metadata to filter only the data events you are interested in

Upvotes: 1

Related Questions