harozi2
harozi2

Reputation: 19

BigQuery: How to delete rows that have 2 columns with identical data?

I haven't been able to find any similar question but I am looking for a way to delete all but 1 of similar rows that have 2 specific columns that contain identical data. For example:

price symbol date
13 RT 2020-10-1
80.9 DX 2020-10-2
81 DX 2020-10-2
90 AP 2020-10-3
89.9 AP 2020-10-3
90 AP 2020-10-3
85 DX 2020-10-4

In this example, I'd like to be able to run a query in the BQ console to find any of the rows with that have both the date AND the symbol as identical and delete one of them (which one gets deleted doesn't matter much.) The query would delete 1 of the DX rows on 2020-10-2 and 2 of the AP rows on 2020-10-3.

I appreciate the help!!

Upvotes: 1

Views: 945

Answers (4)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

Below is for BigQuery Standard SQL

create or replace table your_table as 
with temp as (
  select as value array_agg(t order by price limit 1) [offset(0)]
  from your_table t
  group by symbol, date
)
select * from temp;   

Note: you can remove order by price part if you don't care about which exactly row to survive out of those with duplicate symbol and date

if applied to sample data from your question - resulted table is

enter image description here

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269563

You specifically say that you want to delete based on two columns, not all three. In your example data, the price is the same on all rows, but that might not be the case in the real data.

You can use create or replace table, but I would recommend:

CREATE OR REPLACE TABLE t AS
    SELECT ARRAY_AGG(t LIMIT 1)[ORDINAL(1)].*
    FROM `t` t
    GROUP BY symbol, date;

You can also express this using window functions:

CREATE OR REPLACE TABLE t AS
    SELECT t.* EXCEPT (seqnum)
    FROM (SELECT t.*,
                 ROW_NUMBER() OVER (PARTITION BY symbol, date ORDER BY price) as seqnum
          FROM `t` t
         ) t
    WHERE seqnum = 1;

Upvotes: 1

Popeye
Popeye

Reputation: 35900

As you are using the big-query, I would suggest you to use CREATE OR REPLACE TABLE as follows:

CREATE OR REPLACE TABLE your_table
AS SELECT DISTINCT price, symbol, date
FROM your_table;

Upvotes: 2

Nihat Çelik
Nihat Çelik

Reputation: 29

You can use this example code.

DELETE FROM [SampleDB].[dbo].[Employee]
    WHERE ID NOT IN
    (
        SELECT MAX(ID) AS MaxRecordID
        FROM [SampleDB].[dbo].[Employee]
        GROUP BY [FirstName], 
                 [LastName], 
                 [Country]
    );

Check this link for more info: https://www.sqlshack.com/different-ways-to-sql-delete-duplicate-rows-from-a-sql-table/

Upvotes: 0

Related Questions