HWD
HWD

Reputation: 1629

MySQL: select row only where closest to date has column value

I want to return all rows that were public in May (2019-05), so if a row was turned to draft (and not back to public) at any point before the end of May, I don't want it. For example:

id | post_id | status | date
-------------------------
 1 | 1       | draft  | 2019-03-25
 2 | 1       | public | 2019-04-02
 3 | 1       | draft  | 2019-05-25
 4 | 2       | draft  | 2019-03-10
 5 | 2       | public | 2019-04-01
 6 | 2       | draft  | 2019-06-01

The desired result for the above would return post_id 2 because its last status change prior to the end of May was to public.

post_id 1 was put back in draft before the end of May, so it would not be included.

I'm not sure how to use the correct join or sub-queries to do this as efficiently as possible.

Upvotes: 0

Views: 48

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

You seem to want the status as of 2019-05-31. A correlated subquery seems like the simplest solution:

select t.*
from t
where t.date = (select max(t2.date)
                from t t2
                where t2.post_id = t.post_id and
                      t2.date <= '2019-05-31'
               );

To get the ones that are public, just add a WHERE condition:

select t.*
from t
where t.date = (select max(t2.date)
                from t t2
                where t2.post_id = t.post_id and
                      t2.date <= '2019-05-31'
               ) and
      t.status = 'public';

For performance, you want an index on (post_id, date).

You can also phrase this using a JOIN:

select t.*
from t join
     (select t2.post_id, max(t2.date) as max_date
      from t t2
      where t2.date <= '2019-05-31'
      group by t2.post_id
     ) t2
     on t2.max_date = t.date
where t.status = 'public';

I would expect the correlated subquery to have better performance with the right indexes. However, sometimes MySQL surprises me.

Upvotes: 1

Barbaros &#214;zhan
Barbaros &#214;zhan

Reputation: 65228

we need to determine whether

  1. the status of each post_id is public prior to the month May (the subquery with max(date)),
  2. any post_id exists with status not equals public within the month May,
  3. and then exclude the post_id satisfying the matter 2.

So, you can use :

select distinct t1.post_id
  from tab t1
where t1.post_id not in
    (
     select distinct t1.post_id
       from tab t1
       join
       (
        select post_id, max(date) as date
          from tab 
         where '2019-05-01'> date
         group by post_id ) t2
         on t1.post_id = t2.post_id 
      where t1.status != 'public' 
        and t1.date < '2019-06-01' 
        and t1.date > '2019-04-30'
);

+---------+
| POST_ID |
+---------+
|    2    |
+---------+

Demo

Upvotes: 0

Related Questions