Reputation: 1
Here is an example of data (I'm just sharing an example of one person to hep illustrate my question):
ID | DATE |
---|---|
1 | January 1, 2021 |
1 | February 1, 2021 |
1 | March 1, 2021 |
1 | April 1, 2021 |
1 | May 1, 2021 |
1 | June 1, 2021 |
ID = a unique identifier for a person
DATE = when that person enrolled in a program
A person can enroll in a program for up to 40 days. After that period is up, they become eligible for enrollment again. However, sometimes (for various administrative purposes) someone may enter an enrollment for someone before they are eligible. I am trying to figure out the best way to flag which enrollments were actually eligible.
So for example, the January enrollment is fine because that was the member's first enrollment (so it should be flagged). However, their February enrollment did not occur after 40 days of the prior eligible enrollment (from January), and it should not be flagged. The March enrollment did occur more than 40 days after the eligible January enrollment, so it should be flagged. The April enrollment did not occur after 40 days of the prior eligible enrollment (March) and so it should not be flagged......
I suspect I need to create some sort of loop (I would know how to do this in other programming languages but not SQL. My thinking is along the lines of:
January is the first enrollment so it should be flagged and move to the next row (January stored as the date reference).
February did not occur more than 40 days after the January reference so it should not be flagged, move to the next row (January still the reference).
March did occur more than 40 days after the January reference so it should be flagged, move to the next row (March is now the reference)
.....
Can anyone help me figure out how I could accomplish this?
Upvotes: 0
Views: 100
Reputation:
Here is a brief demo using match_recognize
(which has been available since Oracle 12.1). It takes some time to learn it if you don't know it already; if you are familiar with analytic functions and with regular expressions (for strings), the learning will be easier.
First some test data (creating a small table):
create table enrolments (id, enrolment_date) as
select 1, date '2021-01-01' from dual union all
select 1, date '2021-02-01' from dual union all
select 1, date '2021-03-01' from dual union all
select 1, date '2021-04-01' from dual union all
select 1, date '2021-05-01' from dual union all
select 1, date '2021-06-01' from dual union all
select 8, date '2021-01-05' from dual union all
select 8, date '2021-03-10' from dual union all
select 8, date '2021-03-15' from dual union all
select 8, date '2021-04-10' from dual
;
A few notes here - DATE
is a reserved keyword, so it can't be a column name (I hope it isn't in your real-life data); in my example the date column is date
data type (and I hope it is in your real-life data too, rather than being string data type); and I used the date literal syntax to insert dates.
Then the query and output (and a command to change date formatting in the output to match yours):
alter session set nls_date_format = 'fmMonth dd, yyyy';
select id, enrolment_date, flag
from enrolments
match_recognize(
partition by id
order by enrolment_date
measures nullif(classifier(), 'NOT_ELIGIBLE') as flag
all rows per match
pattern ( ELIGIBLE NOT_ELIGIBLE* )
define NOT_ELIGIBLE as enrolment_date < ELIGIBLE.enrolment_date + 40
);
ID ENROLMENT_DATE FLAG
-- ----------------- -----------
1 January 1, 2021 ELIGIBLE
1 February 1, 2021
1 March 1, 2021 ELIGIBLE
1 April 1, 2021
1 May 1, 2021 ELIGIBLE
1 June 1, 2021
8 January 5, 2021 ELIGIBLE
8 March 10, 2021 ELIGIBLE
8 March 15, 2021
8 April 10, 2021
Upvotes: 2