Reputation: 19
I haven't been able to find any similar question but I am looking for a way to delete all but 1 of similar rows that have 2 specific columns that contain identical data. For example:
price | symbol | date |
---|---|---|
13 | RT | 2020-10-1 |
80.9 | DX | 2020-10-2 |
81 | DX | 2020-10-2 |
90 | AP | 2020-10-3 |
89.9 | AP | 2020-10-3 |
90 | AP | 2020-10-3 |
85 | DX | 2020-10-4 |
In this example, I'd like to be able to run a query in the BQ console to find any of the rows with that have both the date AND the symbol as identical and delete one of them (which one gets deleted doesn't matter much.) The query would delete 1 of the DX rows on 2020-10-2 and 2 of the AP rows on 2020-10-3.
I appreciate the help!!
Upvotes: 1
Views: 945
Reputation: 172974
Below is for BigQuery Standard SQL
create or replace table your_table as
with temp as (
select as value array_agg(t order by price limit 1) [offset(0)]
from your_table t
group by symbol, date
)
select * from temp;
Note: you can remove order by price
part if you don't care about which exactly row to survive out of those with duplicate symbol and date
if applied to sample data from your question - resulted table is
Upvotes: 0
Reputation: 1269563
You specifically say that you want to delete based on two columns, not all three. In your example data, the price
is the same on all rows, but that might not be the case in the real data.
You can use create or replace table
, but I would recommend:
CREATE OR REPLACE TABLE t AS
SELECT ARRAY_AGG(t LIMIT 1)[ORDINAL(1)].*
FROM `t` t
GROUP BY symbol, date;
You can also express this using window functions:
CREATE OR REPLACE TABLE t AS
SELECT t.* EXCEPT (seqnum)
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY symbol, date ORDER BY price) as seqnum
FROM `t` t
) t
WHERE seqnum = 1;
Upvotes: 1
Reputation: 35900
As you are using the big-query, I would suggest you to use CREATE OR REPLACE TABLE
as follows:
CREATE OR REPLACE TABLE your_table
AS SELECT DISTINCT price, symbol, date
FROM your_table;
Upvotes: 2
Reputation: 29
You can use this example code.
DELETE FROM [SampleDB].[dbo].[Employee]
WHERE ID NOT IN
(
SELECT MAX(ID) AS MaxRecordID
FROM [SampleDB].[dbo].[Employee]
GROUP BY [FirstName],
[LastName],
[Country]
);
Check this link for more info: https://www.sqlshack.com/different-ways-to-sql-delete-duplicate-rows-from-a-sql-table/
Upvotes: 0