Jacthoma
Jacthoma

Reputation: 111

Removing duplicates from SQL query

I'm trying to create a SQL query that will not return rows that have the same ID. I would also like to specify a primary row so that, in the event two rows have the same ID, the primary row will be returned. If no primary row is specified I would like to return the first row.

Here is an example of the database I would like to query.

+----+---------+-------+
| id | primary | label |
+----+---------+-------+
| 1  |    Y    |   A   |
| 1  |         |   B   |  
| 2  |         |   C   |
| 2  |         |   D   |
| 3  |         |   E   |
+----+---------+-------+

Here is an example of the result I am trying to achieve

+----+---------+-------+
| id | primary | label |
+----+---------+-------+
| 1  |    Y    |   A   |
| 2  |         |   C   |
| 3  |         |   E   |
+----+---------+-------+

I've been trying to use select distinct but I'm very unsure as to the direction to pursue to solve this problem. Any help is greatly appreciated. Thank you!

Upvotes: 0

Views: 81

Answers (4)

Joachim Isaksson
Joachim Isaksson

Reputation: 180917

I can see you already got an answer, but you could also use a regular LEFT JOIN;

SELECT * FROM mytable a 
LEFT JOIN mytable b
  ON a.id = b.id AND (
            a.label>b.label AND a.primary IS NULL AND b.primary IS NULL OR 
                                a.primary IS NULL AND b.primary='Y')
WHERE b.id IS NULL

An SQLfiddle to test with.

Upvotes: 1

Brian DeMilia
Brian DeMilia

Reputation: 13248

Subqueries would be more appropriate than DISTINCT in your case.

Try the below. Here is a demonstration of it getting your desired result: http://sqlfiddle.com/#!2/97fdd3/1/0

By the way, when there is no "primary" for the ID, this will choose the lowest label value for that ID. This is as others have stated more reliable than the 'order in the database'.

select *
  from tbl t
 where t.label = (select x.label
                    from tbl x
                   where x.primary = 'Y'
                     and x.id = t.id)
    or (not exists
        (select 1
           from tbl x
          where x.primary = 'Y'
            and x.id = t.id) and
        t.label = (select min(x.label) from tbl x where x.id = t.id))

Upvotes: 2

DarkSide
DarkSide

Reputation: 3709

Maybe something like this works (not tested):

SELECT DISTINCT
      t1.id
     ,t2.primary
     ,t2.label
FROM table t1
    LEFT JOIN (
        SELECT id, primary, label
        FROM table
        GROUP BY id
        ORDER BY primary DESC, label ASC
        LIMIT 1
    ) t2 ON t2.id=t1.id

Upvotes: 0

Barmar
Barmar

Reputation: 780994

SELECT a.id, 
       IFNULL(primaries.primary, '') AS primary, 
       IFNULL(primaries.label, secondary.label) AS label
FROM yourTable AS a
LEFT JOIN (SELECT *
           FROM yourTable
           WHERE primary = 'Y') AS primaries
ON a.id = primaries.id AND a.label = primaries.label
LEFT JOIN (SELECT id, MIN(label) AS label
           FROM yourTable
           WHERE primary != 'Y'
           GROUP BY id) AS secondary
ON a.id = secondary.id AND a.label = secondary.label

Upvotes: 0

Related Questions