EGM8686
EGM8686

Reputation: 1572

Remove rows based on duplicate field in column

I have a table with fields like this

F1  F2   F3  F4
1    4    A   5
2    4    A   5
3    4    B   5
4    4    C   5
5    4    D   5

I would like to delete all rows where F3 is a duplicate, so in order to get no duplicate values in field F3:

F1  F2   F3  F4
1    4    A   5
3    4    B   5
4    4    C   5
5    4    D   5

Thx!

Upvotes: 0

Views: 58

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269563

You can use aggregation:

select any_value(f1), any_value(f2), f3, any_value(f4)
from t
group by f3;

If you want all the values to be from a particular row, you can use row_number():

select t.* except (seqnum)
from (select t.*,
             row_number() over (partition by f3 order by f3) as seqnum
      from t
     ) t;

You can adjust the order by key to specify which row you want to keep (say the largest f1 value or whatever).

And another method is:

select as value any_value(t)
from t
group by f3;

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

Below is for BigQuery Standard SQL

#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY F3

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 F1, 4 F2, 'A' F3, 5 F4 UNION ALL
  SELECT 2, 4, 'A', 5 UNION ALL
  SELECT 3, 4, 'B', 5 UNION ALL
  SELECT 4, 4, 'C', 5 UNION ALL
  SELECT 5, 4, 'D', 5 
)
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY F3   

with result

Row F1  F2  F3  F4   
1   1   4   A   5    
2   3   4   B   5    
3   4   4   C   5    
4   5   4   D   5    

Upvotes: 1

Related Questions