Reputation: 2696
I have a database table setup like this:
id | code | group_id | status ---
---|-------|---------|------------
1 | abcd1 | group_1 | available
2 | abcd2 | group_1 | available
3 | adsd3 | group_1 | available
4 | dfgd4 | group_1 | available
5 | vfcd5 | group_1 | available
6 | bgcd6 | group_2 | available
7 | abcd7 | group_2 | available
8 | ahgf8 | group_2 | available
9 | dfgd9 | group_2 | available
10 | qwer6 | group_2 | available
In the example above, each group_id
has 5 total rows (arbitrary for example, total rows will be dynamic and vary), I need to remove every row that matches available
in status
except for 2 of them (which 2 does not matter, as long as there are 2 of them remaining)
Basically every unique group_id
should only have 2
total rows with status
of available
. I am able to do a simple SQL query to remove all of them, but struggling to come up with a SQL query to remove all except for 2 ... please helppppp :)
Upvotes: 0
Views: 990
Reputation: 108510
If the column "id
" is the PRIMARY KEY or a UNIQUE KEY, then we could use a correlated subquery to get the second lowest value for a particular group_id
.
We could then use that to identify rows for group_id
that have higher values of the "id
" column.
A query something like this:
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
We test that as a SELECT first, to examine the rows that are returned. When we are satisfied this query is returning the set of rows we want to delete, we can replace SELECT ... FROM
with DELETE t.* FROM
to convert it to a DELETE
statement to remove the rows.
Error 1093 encountered converting to DELETE statement.
One workaround is to make the query above into a inline view, and then join to the target table
DELETE q.*
FROM `setup_like_this` q
JOIN ( -- inline view, query from above returns `id` of rows we want to delete
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
) r
ON r.id = q.id
Upvotes: 1
Reputation: 15961
If code
is unique, you can use subqueries to keep the "min" and "max"
DELETE FROM t
WHERE t.status = 'available'
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MAX(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MIN(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
Similarly, with an auto increment id:
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
)
I reworked the subquery into a UNION instead in this version, but the "AND" format would work just as well too. Also, if "code" was unique across the whole table, the NOT IN could be simplified down to excluding the group_id as well (though it would still be needed in the subqueries' GROUP BY clauses).
Edit: MySQL doesn't like subqueries referencing tables being UPDATEd/DELETEd in the WHERE of the query doing the UPDATE/DELETE; in those cases, you can usually double-wrap the subquery to give it an alias, causing MySQL to treat it as a temporary table (behind the scenes).
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT * FROM (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
) AS a
)
Another alternative, I don't recall if MySQL complains as much about joins in DELETE/UPDATE....
DELETE t
FROM t
LEFT JOIN (
SELECT MIN(id) AS minId, MAX(id) AS maxId, 1 AS keep_flag
FROM t
WHERE status = 'available'
GROUP BY group_id
) AS tKeep ON t.id IN (tKeep.minId, tKeep.maxId)
WHERE t.status = 'available'
AND tKeep.keep_flag IS NULL
Upvotes: 2
Reputation: 1271151
To keep the min and max ids, I think a join
is the simplest solution:
DELETE t
FROM t LEFT JOIN
(SELECT group_id, MIN(id) as min_id, MAX(id) as max_id
FROM t
WHERE t.status = 'available'
GROUP BY group_id
) tt
ON t.id IN (tt.min_id, tt.max_id)
WHERE t.status = 'available' AND
tt.group_id IS NULL;
Upvotes: 2
Reputation: 3089
select id, code, group_id, status
from (
select id, code, group_id, status
, ROW_NUMBER() OVER (
PARTITION BY group_id
ORDER BY id DESC) row_num
) rownum
from a
) q
where rownum < 3
Upvotes: 0