Get the minimum and maximum of date from a historical table in one row for postgres/redshift?

Question

I have a historical employee table like below.

job_cd  empl_id     hr_begin_dt hr_end_dt 
900234  100349583   2014-10-10  2014-10-25
900234  100349583   2014-10-26  2014-11-12
900234  100349583   2014-11-13  2014-12-21
900234  100349583   2014-12-22  2014-12-31
900234  100349583   2015-01-01  2015-01-03
900234  100349583   2015-01-04  2015-01-04
900234  100349583   2015-01-05  2015-02-07
P03251  100349583   2015-02-08  2015-02-10
900234  100349583   2015-02-08  2015-02-10
900234  100349583   2015-02-11  2015-02-27
P03251  100349583   2015-02-11  2015-02-27
900234  100349583   2015-02-28  2015-02-28
P03251  100349583   2015-02-28  2015-02-28
P03251  100349583   2015-03-01  2015-03-31

I need to get one row for each position change with min(hr_begin_date) and max(hr_end_dt).

job_cd   empl_id     hr_begin_dt hr_end_dt
900234  100349583   2014-10-10  2015-02-07
P03251  100349583   2015-02-08  2015-02-10
900234  100349583   2015-02-08  2015-02-27
P03251  100349583   2015-02-11  2015-02-27
.....

I was able to create a logic to see the each change in the position, but then can not build any logic to get the above table. If some one have a better idea how to do that. Thanks.

with CTE as
(
select 
    ROW_NUMBER() over (order by hr_begin_dt asc) as row_no_
    ,job_cd
    ,empl_id
    ,hr_begin_dt
    ,hr_end_dt
from hrbi.hrbi_active_headcount 
where  empl_id in (100349583)
order by hr_begin_dt asc
) 
select 
    f.row_no_ as f_row_no_
    ,f.job_cd as f_job_cd
    ,f.empl_id as f_empl_id
    ,f.hr_begin_dt as f_hr_begin_dt
    ,s.row_no_
    ,s.job_cd
    ,s.empl_id
    ,s.hr_begin_dt
    ,case 
        when f.job_cd <> s.job_cd then f.row_no_ + s.row_no_
        else 0
    end as job_change_flag
from CTE as f
left join CTE as s
on f.row_no_ = s.row_no_ + 1
order by f.hr_begin_dt, f.row_no_;

Gordon Linoff · Accepted Answer

This is a gaps-and-islands problem. For this version, a difference of row numbers works:

select empl_id, job_cd, min(hr_begin_dt), max(hr_end_dt)
from (select ahc.*,
             row_number() over (partition by empl_id order by hr_begin_dt) as seqnum,
             row_number() over (partition by empl_id, job_cd order by hr_begin_dt) as seqnum_2
      from hrbi.hrbi_active_headcount ahc
     ) ahc
group by empl_id, job_cd, (seqnum - seqnum_2);

Get the minimum and maximum of date from a historical table in one row for postgres/redshift?

Answers (2)

Related Questions