sMyles
sMyles

Reputation: 2696

SQL/MySQL DELETE all rows EXCEPT 2 of them

I have a database table setup like this:

   id | code  | group_id | status ---
   ---|-------|---------|------------
    1 | abcd1 | group_1 | available
    2 | abcd2 | group_1 | available
    3 | adsd3 | group_1 | available
    4 | dfgd4 | group_1 | available
    5 | vfcd5 | group_1 | available

    6 | bgcd6 | group_2 | available
    7 | abcd7 | group_2 | available
    8 | ahgf8 | group_2 | available
    9 | dfgd9 | group_2 | available
   10 | qwer6 | group_2 | available

In the example above, each group_id has 5 total rows (arbitrary for example, total rows will be dynamic and vary), I need to remove every row that matches available in status except for 2 of them (which 2 does not matter, as long as there are 2 of them remaining)

Basically every unique group_id should only have 2 total rows with status of available. I am able to do a simple SQL query to remove all of them, but struggling to come up with a SQL query to remove all except for 2 ... please helppppp :)

Upvotes: 0

Views: 990

Answers (4)

spencer7593
spencer7593

Reputation: 108510

If the column "id" is the PRIMARY KEY or a UNIQUE KEY, then we could use a correlated subquery to get the second lowest value for a particular group_id.

We could then use that to identify rows for group_id that have higher values of the "id" column.

A query something like this:

 SELECT t.`id`
      , t.`group_id` 
   FROM `setup_like_this` t
  WHERE t.`status`  = 'available'
    AND t.`id`
          > ( SELECT s.`id`
                FROM `setup_like_this` s
               WHERE s.`status`   = 'available'
                 AND s.`group_id` = t.`group_id`
               ORDER
                  BY s.`id`
               LIMIT 1,1
            )

We test that as a SELECT first, to examine the rows that are returned. When we are satisfied this query is returning the set of rows we want to delete, we can replace SELECT ... FROM with DELETE t.* FROM to convert it to a DELETE statement to remove the rows.


Error 1093 encountered converting to DELETE statement.

One workaround is to make the query above into a inline view, and then join to the target table

DELETE q.*
  FROM `setup_like_this` q
  JOIN ( -- inline view, query from above returns `id` of rows we want to delete
         SELECT t.`id`
              , t.`group_id` 
           FROM `setup_like_this` t
          WHERE t.`status`  = 'available'
            AND t.`id`
                  > ( SELECT s.`id`
                        FROM `setup_like_this` s
                       WHERE s.`status`   = 'available'
                         AND s.`group_id` = t.`group_id`
                       ORDER
                          BY s.`id`
                       LIMIT 1,1
                    )
       ) r
    ON r.id = q.id

Upvotes: 1

Uueerdo
Uueerdo

Reputation: 15961

If code is unique, you can use subqueries to keep the "min" and "max"

DELETE FROM t
WHERE t.status = 'available' 
   AND (t.group_id, t.code) NOT IN (
      SELECT group_id, MAX(code) 
      FROM t 
      WHERE status = 'available' 
      GROUP BY group_id
  )
  AND (t.group_id, t.code) NOT IN (
      SELECT group_id, MIN(code) 
      FROM t 
      WHERE status = 'available' 
      GROUP BY group_id
  )

Similarly, with an auto increment id:

DELETE FROM t
WHERE t.status = 'available' 
   AND t.id NOT IN (
      SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
      UNION
      SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
)

I reworked the subquery into a UNION instead in this version, but the "AND" format would work just as well too. Also, if "code" was unique across the whole table, the NOT IN could be simplified down to excluding the group_id as well (though it would still be needed in the subqueries' GROUP BY clauses).


Edit: MySQL doesn't like subqueries referencing tables being UPDATEd/DELETEd in the WHERE of the query doing the UPDATE/DELETE; in those cases, you can usually double-wrap the subquery to give it an alias, causing MySQL to treat it as a temporary table (behind the scenes).

DELETE FROM t
WHERE t.status = 'available' 
   AND t.id NOT IN ( 
      SELECT * FROM (
         SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
         UNION
         SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
      ) AS a
)

Another alternative, I don't recall if MySQL complains as much about joins in DELETE/UPDATE....

DELETE t
FROM t
LEFT JOIN (
   SELECT MIN(id) AS minId, MAX(id) AS maxId, 1 AS keep_flag 
   FROM t 
   WHERE status = 'available' 
   GROUP BY group_id
) AS tKeep ON t.id IN (tKeep.minId, tKeep.maxId)
WHERE t.status = 'available' 
    AND tKeep.keep_flag IS NULL

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271151

To keep the min and max ids, I think a join is the simplest solution:

DELETE t
     FROM t LEFT JOIN
          (SELECT group_id, MIN(id) as min_id, MAX(id) as max_id
           FROM t
           WHERE t.status = 'available' 
           GROUP BY group_id
          ) tt
          ON t.id IN (tt.min_id, tt.max_id)
WHERE t.status = 'available' AND
      tt.group_id IS NULL;

Upvotes: 2

dougp
dougp

Reputation: 3089

select id, code, group_id, status
from (
  select id, code, group_id, status
  , ROW_NUMBER() OVER (
      PARTITION BY group_id 
      ORDER BY id DESC) row_num
    ) rownum
  from a
) q
where rownum < 3

Upvotes: 0

Related Questions