Reputation: 155
I want to manipulate drug prescription data using PostgreSQL 9.6.
The sample data structure is as below.
create table drug_table (
id int,
start_date date,
end_date date,
dose int
);
insert into drug_table values(1005, '2010-01-08', '2010-02-05', 15);
insert into drug_table values(1005, '2010-01-30', '2010-02-28', 10);
insert into drug_table values(1005, '2010-03-02', '2010-03-10', 20);
insert into drug_table values(1005, '2010-03-12', '2010-04-28', 20);
insert into drug_table values(1005, '2010-04-25', '2010-05-15', 20);
As the sample date, there is overlapping period between each row. Between first and second rows, period of [2010-02-05 ~ 2010-01-30] is overlapped.
When overlapped period is present, the start and end date of latter row should be delayed (in this case, the second row will have '2010-02-05' of start date and '2010-03-06' of end date considering 6 day of overlapping between first and second rows).
I tried this problem using window function.
select id,
GREATEST(start_date, MAX(end_date) OVER (PARTITION BY id ORDER BY id, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 preceding)) re_start_date,
(GREATEST(start_date, MAX(end_date) OVER (PARTITION BY id ORDER BY id, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 preceding)) + interval '1' day *(end_date-start_date))::date re_end_date,
dose
from drug_table
order by id, start_date, end_date;
The result of this sql code is below.
id re_start_date re_end_date dose
1005 2010-01-08 2010-02-05 15
1005 2010-02-05 *2010-03-06* 10
1005 *2010-03-02* 2010-03-10 20
1005 2010-03-12 2010-04-28 20
1005 2010-04-28 2010-05-18 20
However, if overlapping occurred between 2nd and 3rd prescription due to the renewal date of 2nd row by overlapping between 1st and 2nd rows, the 3rd row did not reflect renewal date of 2nd row in this code. There is overlapping period of [2010-03-06 ~ 2010-03-02] between 2nd and 3rd rows. I want to make the 3rd row into '2010-03-06' of start date and '2010-03-14' of end date reflecting the renewal data of 2nd row.
This table is the result I want.
id re_start_date re_end_date dose
1005 2010-01-08 2010-02-05 15
1005 2010-02-05 2010-03-06 10
1005 2010-03-06 2010-03-14 20
1005 2010-03-14 2010-04-30 20
1005 2010-04-30 2010-05-18 20
The 4th, 5th row is also delayed considering the delayed date in proceeding rows.
If I reuse window function once more, It can reflect the delay when overlapping is present just twice. But when overlapping is more present (triple or more...), I think reuse of window function is not good solution. Loop function might be need for my purpose.
Can I get some hint solving this problem? If possible, I want deal this data using Postgresql.
Upvotes: 2
Views: 116
Reputation: 16377
I think you could accomplish this with a function:
CREATE OR REPLACE FUNCTION shift_dates() RETURNS setof drug_table AS
$BODY$
DECLARE
rw drug_table%rowtype;
last_date date;
shift_days integer;
last_id integer;
BEGIN
last_id = -314159;
for rw in select * from drug_table order by id, start_date loop
if rw.id != last_id then
last_date := '0001-01-01';
last_id := rw.id;
end if;
if rw.start_date < last_date then
shift_days := last_date - rw.start_date;
rw.start_date := last_date;
rw.end_date := rw.end_date + shift_days;
end if;
last_date := rw.end_date;
return next rw;
end loop;
return;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
And then, of course:
select * from shift_dates();
I tested it on your data, and the last date did not agree -- I came up with 5/30 instead of 5/28, but I do believe there is a two day shift. In other words, I think 5/30 is right. Check it out and see if you agree:
id start_date end_date dose
1005 1/8/10 2/5/10 15
1005 2/5/10 3/6/10 10
1005 3/6/10 3/14/10 20
1005 3/14/10 4/30/10 20
1005 4/30/10 5/20/10 20
Also, the last_id
variable was not necessary, based on your sample data, but presupposing you want to do this against a range of id
values, then this (I hope) answers your next question.
Upvotes: 1