Reputation: 190
I have a table (structure below) that I need to clean up by deleting rows for each Object_ID
:
WHERE Current_Step is NULL and Change = 'change'
and Date_of_Change <> MIN(Date_of_Change)
That is, I need to leave only the row with minimum date for each Object_ID
.
Object_ID | Current_Step | Change | Date_of_Change |
---|---|---|---|
0025307 | NULL | change | 16.11.2021 |
0025307 | NULL | change | 19.11.2021 |
0025307 | NULL | change | 19.11.2021 |
I am using MS SQL.
There are no primary keys.
All columns are VARCHAR
except Date_of_Change
being of type DATE
.
The reason why I need to clean up this table is because it was incorrectly filled because source query was checking for IF NULL = NULL and then marked those status changes as changed even though they did not change. So I need to revert values back to original date they were changed because if they still have value NULL that means there were no actual changes happening in status.
My attempt in identifying rows that I need to keep:
SELECT [Object_ID]
,MIN([Date_of_Change])
FROM table
WHERE [Current_Step] IS NULL
AND [Change] = 'change'
GROUP BY Object_ID
I just need to remove other rows with the same Object_ID whose Date_of_Change is not equal to the one identified in query above.
Upvotes: 1
Views: 807
Reputation: 9387
So you have a table with versioned objects which holds change records associated to the object with some details and a date.
Now you want to select
MIN
function on a DATE
column-typeThis oldest should be retained/kept and stay. All other object change-versions should be deleted.
A. Selecting the fist/oldest changes per object in 2 steps.
SELECT Object_ID, COUNT(Object_ID) AS Count_Changes, MIN(Date_of_Change) AS First_Change
FROM table
GROUP BY Object_ID
Resultset contains each object with the total count of changes and the date of the first change.
JOIN
:SELECT *
FROM table t
-- join with a table-subquery having only 2 columns to correlate
JOIN (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table
WHERE Current_Step is NULL and Change = 'change'
GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change = m.First_Change
WHERE Current_Step is NULL and Change = 'change'
This are the rows to keep and not remove. The first change of each object should be retained and not cleaned.
B. Now we can invert the JOIN-condition to get all the rows, that we want to delete/clean:
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change = m.First_Change
to not-equal:
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
SELECT COUNT(Object_ID) AS records_to_remove
FROM table t
-- join with a table-subquery having only 2 columns to correlate
JOIN (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table
WHERE Current_Step is NULL and Change = 'change'
GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
WHERE Current_Step is NULL and Change = 'change'
DELETE FROM table t
JOIN (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table
WHERE Current_Step is NULL and Change = 'change'
GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
WHERE t.Current_Step is NULL AND t.Change = 'change'
USING
on other DBMSSome DBMS do not support JOIN in DELETE statements, but alternatives like USING
:
DELETE FROM table t
USING (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table t2
WHERE t2.Current_Step is NULL AND t2.Change = 'change'
) AS m
WHERE ...
AND t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
Upvotes: 0
Reputation: 61
do join on same table like i did on 'Table1223' below.
Example:
DELETE tbl
FROM Table1223 tbl
JOIN (SELECT * FROM Table1223) objID
ON objID.Object_ID = tbl.Object_ID
WHERE tbl.Date_of_Change > objID.Date_of_Change
Upvotes: 1