Marcin_S
Marcin_S

Reputation: 539

Delete Duplicate Record From BigQuery

I have table which have column ID, which define unique records, all the rows having same ID are duplicate, I want to remove the duplicate rows and keeping only unique ID

ID Data
1   2
1   2
2   3
2   3

After Removal (My Data in BigQuery Table)

ID Data
1   2
2   3

I can create a new table having unique records only but is there any way I can update the existing table i.e remove the duplicate records from the table.

Upvotes: 0

Views: 205

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Consider below ( all the rows having same ID are duplicate)

select as value any_value(t)
from `project.dataset.table` t
group by id    

with output

enter image description here

In case , if definition of duplicate is - all the rows having all fields duplicate - consider below

select as value any_value(t)
from `project.dataset.table` t
group by format('%t', t)

Below is example of how to use it with DDL:

create or replace table `project.dataset.table` as 
select * from ( 
  select as value any_value(t)
  from `project.dataset.table` t
  group by format('%t', t)
);

Upvotes: 1

Related Questions