BGBVlC
BGBVlC

Reputation: 29

Removing SQL Rows from Query if two rows have an identical ID but differences in the columns

I´m currently working stuck on a SQL issue (well, mainly because I can´t find a way to google it and my SQL skills do not suffice to solve it myself)

I´m working on a system where documents are edited. If the editing process is finished, users mark the document as solved. In the MSSQL database, the corresponding row is not updated but instead, a new row is inserted. Thus, every document that has been processed has [e.g.: should have] multiple rows in the DB.

See the following situation:

ID ID2 AnotherCondition Steps Process Solved
1 1 yes Three ATAT AF
2 2 yes One ATAT FR
2 3 yes One ATAT EG
2 4 yes One ATAT AF
3 5 no One ABAT AF
4 6 yes One ATAT FR
5 7 no One AVAT EG
6 8 yes Two SATT FR
6 9 yes Two SATT EG
6 10 yes Two SATT AF

I need to select the rows which have not been processed yet. A "processed" document has a "FR" in the "Solved" column. Sadly other versions of the document exist in the DB, with other codes in the "Solved" columns.

Now: If there is a row which has "FR" in the "Solved" column I need to remove every row with the same ID from my SELECT statement as well. Is this doable?

In order to achieve this, I have to remove the rows with the IDs 2 | 4 (because the system sadly isn´t too reliable I guess) | and 6 in my select statement. Is this possible in general?

What I could do is to filter out the duplicates afterwards, in python/js/whatever. But I am curious whether I can "remove" these rows directly in the SQL statement as well.

To rephrase it another time: How can I make a select statement which returns only (in this example) the rows containing the ID´s 1, 3 and 5?

Upvotes: 1

Views: 69

Answers (2)

lemon
lemon

Reputation: 15482

If you need to delete all rows where every id doesn't have any "Solved = 'no'", you can use a DELETE statement that will exclude all "id" values that have at least one "Solved = 'no'" in the corresponding rows.

DELETE FROM tab
WHERE id NOT IN (SELECT id FROM tab WHERE Solved1 = 'no');

Check the demo here.


Edit. If you need to use a SELECT statement, you can simply reverse the condition in the subquery:

SELECT * 
FROM tab
WHERE id NOT IN (SELECT id FROM tab WHERE Solved1 = 'yes');

Check the demo here.

Upvotes: 1

Andreas Sundström
Andreas Sundström

Reputation: 218

I'm not sure I understand your question correct:

...every document that has been processed has [...] multiple rows in the DB

I need to find out which documents have not been processed yet

So it seems you need to find unique documents with no versions, this could be done using a GROUP BY with a HAVING clause:

SELECT
Id
FROM dbo.TableName
GROUP BY Id
HAVING COUNT(*) = 1

Upvotes: 1

Related Questions