Vadim
Vadim

Reputation: 569

MySQL GROUP BY while keeping certain rows by column content

There is table with duplicated rows. See rows 1 and 2:

id   full_name    email           status   active
1    John Doe     [email protected]   ok       1
2    John Doe     [email protected]   null     1
3    Ricky Duke   [email protected]   null     1
4    Jane Doe     [email protected]   block    1

I need to select distinct rows, not randomly - one distinct row, but the one that has a 'status' NOT NULL.

My query is:

SELECT full_name, email
FROM `subscribers`
WHERE active = 1 AND (status = 'ok' OR status IS NULL)
GROUP BY email

That query selects distinct rows randomly, without prioritizing 'status' field.

How can i prioritize selection of distinct rows, that has a 'status' NOT NULL, and select ones with NULL only in case there is no rows with 'ok' status is present?

Upvotes: 0

Views: 248

Answers (3)

Ricardo Frasson
Ricardo Frasson

Reputation: 83

(SELECT full_name, email
FROM `subscribers`
WHERE active = 1 AND status IS NOT NULL
GROUP BY email) 
UNION ALL
(SELECT full_name, email
FROM `subscribers`
WHERE active = 1 AND status IS NULL AND
email not in (SELECT distinct email
FROM `subscribers` 
WHERE active = 1 AND status IS NOT NULL)
GROUP BY email);

Upvotes: 0

GMB
GMB

Reputation: 222582

You could filter with a correlated subquery that does conditional ordering, and gives a lowest priority to null statuses:

select t.*
from mytable t
where t.id = (
    select id
    from mytable t1
    where 
        t1.full_name = t.full_name
        and t1.email = t.email
        and t1.active = t.active
    order by status is null, status  
    limit 1
)

This defines duplicats as records that have the same full_name, email and active. You might want to adapt that to your actual definition of duplicates.

Demo on DB Fiddle:

id | full_name  | email         | status | active
-: | :--------- | :------------ | :----- | :-----
 1 | John Doe   | [email protected] | ok     | 1     
 3 | Ricky Duke | [email protected] | null   | 1     
 4 | Jane Doe   | [email protected] | block  | 1     

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270391

You can use row_number():

select s.*
from (select s.*,
             row_number() over (partition by email order by (status is not null) desc) as seqnum
      from subscribers s
      where active = 1
     ) s
where seqnum = 1;

Upvotes: 1

Related Questions