Reputation: 389
I have a table with more than 20 columns and more the 300.000 rows out of which the relevant columns for my question are the following:
column1 |column2 |column3 | date
123 | 657 | 2222 | 20 dec
123 | 658 | 2222 | Null
124 | 543 | 3333 | Null
124 | 544 | 3333 | 1 Feb
125 | 098 | 4444 | Null
125 | 065 | 4444 | Null
I did the following query to sort them as I needed:
select *
from table
order by column1,column2,date desc;
The result is:
column1 |column2 |column3 | date
123 | 657 | 2222 | 20 dec
123 | 658 | 2222 | Null
124 | 544 | 3333 | 1 Feb
124 | 543 | 3333 | Null
125 | 065 | 4444 | Null
125 | 098 | 4444 | Null
The main purpose is to delete the rows where based on column1
are duplicates if they have a date that is not null
. If they both have Null
then I sould be able to keep just one. So what I am looking for is this:
column1 |column2 |column3 | date
123 | 657 | 2222 | 20 dec
124 | 544 | 3333 | 1 Feb
125 | 065 | 4444 | Null
Can someone help me to develop this query. I wish to retreive ALL of the other columns not just the ones displayed.
I am using Azure SQL which is basically SQL Server
I tried doing this:
SELECT * FROM table AS main
WHERE main.column4 IN
(SELECT sub.MinID FROM
(SELECT column1, column3, min(column4) As MinID
FROM table
GROUP BY column1, column3) AS sub)
ORDER BY main.column1, main.column3;
The problem with this is that since the rule is min(column4)
when I have a Not Null
whose column4
value is less than the other it choose the NULL
. I tried putting min(date)
but I get this error:
Arithmetic overflow error converting expression to data type smalldatetime
I am guessing because of the Null
values.
How can input the rule that if one is Not null
and the other is Null
then choose the Not Null
Upvotes: 2
Views: 6462
Reputation: 32685
Looks like classic top-n-per-group
. One way to do it is to use ROW_NUMBER
:
WITH
CTE
AS
(
SELECT
column1
,column2
,column3
,dt
,ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY dt desc, column2) AS rn
FROM tbl
)
SELECT
column1
,column2
,column3
,dt
FROM CTE
WHERE rn = 1
ORDER BY
column1
;
PARTITION BY column1
means that numbering of rows restarts when column1
changes. ORDER BY dt desc, column2
determines the order when numbering the rows. In the end you just pick the first row of the partition using WHERE rn = 1
.
The final result will have only one row per column1
, so adding more columns to the final ORDER BY
doesn't change anything.
Upvotes: 4
Reputation: 1922
CTE is probably your best bet:
WITH CTE
AS (
SELECT column1
,column2
,column3
,DATE
,ROW_NUMBER() OVER (
PARTITION BY column1 ORDER BY
date
,column1
,column2
,column3 DESC
) AS RN
FROM table
)
DELETE
FROM CTE
WHERE RN <> 1
To check that what you are deleting is the correct data, just change DELETE to SELECT * at the end.
Upvotes: 0
Reputation: 146419
Try this:
Select * from table t
Where (date is not null
and not exists
(select * from table
where date is Null
and column1 = t.Column1)) Or
column2 =
(Select Max(column2) from table
Where column1 = t.Column1)
Upvotes: 0