Lostsoul
Lostsoul

Reputation: 26057

Can I prevent duplicate data in bigquery?

I'm playing with BQ and I create a table and inserted some data. I reinserted it and it created duplicates. I'm sure I'm missing something, but is there something I can do to ignore it if the data exists in the table?

My use case is I get a stream of data from various clients and sometimes their data will include some data they previously already sent(I have no control on them submitting).

Is there a way to prevent duplicates when certain conditions are met? The easy one is if the entire data is the same but also if certain columns are present?

Upvotes: 0

Views: 843

Answers (2)

HappyFreddie
HappyFreddie

Reputation: 135

Just like the brother up there said. Using MERGE like :

MERGE `your_dataset.temp_table` T
USING (
  SELECT
  FORMAT_DATE('%b-%y', date) AS month,
  EXTRACT(YEAR FROM date) as year,
  FORMAT_DATE('%b', date) as month_text,
  EXTRACT(MONTH FROM date) as month_num,
  1.5 as metric_value
  FROM (
  SELECT
      DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH) AS date) AS main 
  ) S
ON T.month = S.month
AND T.year = S.year
AND T.month_text = S.month_text
AND T.month_num = S.month_num
WHEN NOT MATCHED THEN
INSERT VALUES (S.month, S.year,S.month_text, S.month_num, S.metric_value)   

Upvotes: 0

Cylldby
Cylldby

Reputation: 1978

It's difficult to answer your question without a clear idea of the table structure, but it feels like you could be interested in the MERGE statement: ref here.

With this DML statement you can perform a mix of INSERT, UPDATE, and DELETE statements, hence do exactly what you are describing.

Upvotes: 2

Related Questions