Mawg
Mawg

Reputation: 40169

MySQL grouping and sorting

mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field        | Type    | Null | Key | Default | Extra          |
+--------------+---------+------+-----+---------+----------------+
| job_id       | int(11) | NO   | PRI | NULL    | auto_increment |
| candidate_id | int(11) | NO   | MUL | NULL    |                |
| company_id   | int(11) | NO   | MUL | NULL    |                |
| start_date   | date    | NO   | MUL | NULL    |                |
| end_date     | date    | NO   | MUL | NULL    |                |
+--------------+---------+------+-----+---------+----------------+
5 rows in set (0.01 sec)

Each candidate can have multiple jobs. I want to find the latest job for each candidate (based on start date, because end date can be 0000-00-00), and check (in PHP) if the end_Date is not 0000-00-00.

(if your last end date is not 0000-00-00, then you are currently unemployed, which is what I am looking for; I don't know how to do this in my query, so will do it in PHP).

The query SELECT candidate_id, end_Date FROM jobs ORDER BY candidate_id , start_date DESC gets me halfway there.

+--------------+------------+
| candidate_id | end_Date   |
+--------------+------------+
|            1 | 2019-08-31 |
|            1 | 2019-01-31 |
|            1 | 2019-05-31 |
|            2 | 0000-00-00 |
|            2 | 2018-02-28 |
|            2 | 2017-05-31 |
|            2 | 2016-09-30 |
|            3 | 0000-00-00 |
|            3 | 2019-05-31 |
|            4 | 2019-04-30 |
|            4 | 2019-09-30 |

(How) can I get only the first entry (row with the most recent start_date) for each candidate Id? And can I get only those where the end date is not 0000-00-00?

(Oops, it looks like my ordering by end_date is not working)

Upvotes: 2

Views: 68

Answers (4)

Gordon Linoff
Gordon Linoff

Reputation: 1270181

You can do this with aggregation:

select candidate_id,
       (case when sum(end_date = '0000-00-00') > 0
             then '0000-00-00'
             else max(end_date)
        end) as enddate
from jobs j
group by candidate_id;

Or another method:

select j.*
from jobs j
where j.end_date = '0000-00-00' or
      (not exists (select 1
                   from jobs j2
                   where j2.candidate_id = j.candidate_id and
                         (j2.end_date = '0000-00-00' or
                          j2.end_date > j.end_date
                         )
                  )
      );

Or even:

select j.*
from jobs j
where j.job_id = (select j2.job_id
                  from jobs j2
                  where j2.candidate_id = j.candidate_id
                  order by (j2.end_date = '0000-00-00') desc,
                           j2.end_date desc
                 );

Upvotes: 0

fifonik
fifonik

Reputation: 1606

Option without sub-query:

SELECT
    j.*
FROM
    jobs AS j
    LEFT JOIN jobs AS j2 ON (
            j2.candidate_id = j.candidate_id
        AND j2.start_date   > j.start.date
    )
WHERE
    j2.candidate_id IS NULL

You'd like to have composite index (candidate_id, start_date) to optimize the query.

Upvotes: 1

ScaisEdge
ScaisEdge

Reputation: 133380

You could use a join on max start_date group by candidate

select  * 
from jobs j
inner join  (

  select candidate_id ,  max(start_date) max_start_date 
  from jobs
  group by   candidate_id

  ) t on t.candidate_id = j.candidate_id 
    and t.max_start_date = j.start_date 

Upvotes: 1

GMB
GMB

Reputation: 222542

You can filter with a correlated subquery:

select j.*
from jobs j
where j.start_date = (
    select max(start_date)
    from jobs j1
    where j1.candidate_id = j.candidate_id and j1.end_date <> '0000-00-00'
)

The subquery returns the greatest start_date whose end_date is not null for the current candidate.

Another typical method to solve this top 1 per group problem is to use an anti-left join:

select j.*
from jobs j
left join jobs j1 
    on  j1.candidate_id = j.candidate_id
    and j1.start_date > j.start_date
    and j1.end_date is not null
where 
    j.end_date is not null
    and j1.job_id is null

This phrases as: give me the records with a non-null end_date for which no other record exists with the same candidate_id, a greater start_date and a non-null end_date.

Upvotes: 2

Related Questions