Delete Duplicate Record From BigQuery

Question

I have table which have column ID, which define unique records, all the rows having same ID are duplicate, I want to remove the duplicate rows and keeping only unique ID

After Removal (My Data in BigQuery Table)

ID Data
1   2
2   3

I can create a new table having unique records only but is there any way I can update the existing table i.e remove the duplicate records from the table.

Mikhail Berlyant · Accepted Answer

Consider below ( all the rows having same ID are duplicate)

select as value any_value(t)
from `project.dataset.table` t
group by id

with output

In case , if definition of duplicate is - all the rows having all fields duplicate - consider below

select as value any_value(t)
from `project.dataset.table` t
group by format('%t', t)

Below is example of how to use it with DDL:

create or replace table `project.dataset.table` as 
select * from ( 
  select as value any_value(t)
  from `project.dataset.table` t
  group by format('%t', t)
);

Delete Duplicate Record From BigQuery

Answers (1)

Related Questions