JKim
JKim

Reputation: 155

Using postgresql, how can I get a date value reflecting overlapping period

I want to manipulate drug prescription data using PostgreSQL 9.6.

The sample data structure is as below.

create table drug_table (
id int,
start_date date,
end_date date,
dose int
);

insert into drug_table values(1005, '2010-01-08', '2010-02-05', 15);
insert into drug_table values(1005, '2010-01-30', '2010-02-28', 10);
insert into drug_table values(1005, '2010-03-02', '2010-03-10', 20);
insert into drug_table values(1005, '2010-03-12', '2010-04-28', 20);
insert into drug_table values(1005, '2010-04-25', '2010-05-15', 20);

As the sample date, there is overlapping period between each row. Between first and second rows, period of [2010-02-05 ~ 2010-01-30] is overlapped.

When overlapped period is present, the start and end date of latter row should be delayed (in this case, the second row will have '2010-02-05' of start date and '2010-03-06' of end date considering 6 day of overlapping between first and second rows).

I tried this problem using window function.

select id,
    GREATEST(start_date, MAX(end_date) OVER (PARTITION BY id ORDER BY id, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 preceding)) re_start_date, 
    (GREATEST(start_date, MAX(end_date) OVER (PARTITION BY id ORDER BY id, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 preceding)) + interval '1' day *(end_date-start_date))::date re_end_date,
    dose
from drug_table
order by id, start_date, end_date;

The result of this sql code is below.

id  re_start_date  re_end_date  dose
1005  2010-01-08  2010-02-05  15
1005  2010-02-05  *2010-03-06*  10
1005  *2010-03-02*  2010-03-10  20
1005  2010-03-12  2010-04-28  20
1005  2010-04-28  2010-05-18  20

However, if overlapping occurred between 2nd and 3rd prescription due to the renewal date of 2nd row by overlapping between 1st and 2nd rows, the 3rd row did not reflect renewal date of 2nd row in this code. There is overlapping period of [2010-03-06 ~ 2010-03-02] between 2nd and 3rd rows. I want to make the 3rd row into '2010-03-06' of start date and '2010-03-14' of end date reflecting the renewal data of 2nd row.

This table is the result I want.

id  re_start_date  re_end_date  dose
1005  2010-01-08  2010-02-05  15
1005  2010-02-05  2010-03-06  10
1005  2010-03-06  2010-03-14  20
1005  2010-03-14  2010-04-30  20
1005  2010-04-30  2010-05-18  20

The 4th, 5th row is also delayed considering the delayed date in proceeding rows.

If I reuse window function once more, It can reflect the delay when overlapping is present just twice. But when overlapping is more present (triple or more...), I think reuse of window function is not good solution. Loop function might be need for my purpose.

Can I get some hint solving this problem? If possible, I want deal this data using Postgresql.

Upvotes: 2

Views: 116

Answers (1)

Hambone
Hambone

Reputation: 16377

I think you could accomplish this with a function:

CREATE OR REPLACE FUNCTION shift_dates() RETURNS setof drug_table AS
$BODY$
DECLARE
  rw drug_table%rowtype;
  last_date date;
  shift_days integer;
  last_id integer;
BEGIN

  last_id = -314159;

  for rw in select * from drug_table order by id, start_date loop

    if rw.id != last_id then
      last_date := '0001-01-01';
      last_id := rw.id;
    end if;

    if rw.start_date < last_date then
      shift_days := last_date - rw.start_date;
      rw.start_date := last_date;
      rw.end_date := rw.end_date + shift_days;
    end if;

    last_date := rw.end_date;

    return next rw;
  end loop;

  return;

END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

And then, of course:

select * from shift_dates();

I tested it on your data, and the last date did not agree -- I came up with 5/30 instead of 5/28, but I do believe there is a two day shift. In other words, I think 5/30 is right. Check it out and see if you agree:

id      start_date  end_date    dose
1005    1/8/10      2/5/10      15
1005    2/5/10      3/6/10      10
1005    3/6/10      3/14/10     20
1005    3/14/10     4/30/10     20
1005    4/30/10     5/20/10     20

Also, the last_id variable was not necessary, based on your sample data, but presupposing you want to do this against a range of id values, then this (I hope) answers your next question.

Upvotes: 1

Related Questions