Reputation: 1
I need to create a row if Current End date compared to Start date from next row are discontinuous by each Employee Number. The dataframe looks like this:
Employee Number | Start Date | End Date |
---|---|---|
001 | 1999-11-29 | 2000-03-12 |
001 | 2000-03-13 | 2001-06-30 |
001 | 2001-07-01 | 2002-01-01 |
002 | 2000-09-18 | 2000-10-05 |
002 | 2000-10-06 | 2001-06-30 |
002 | 2004-05-01 | 2005-12-31 |
002 | 2008-01-01 | 2008-11-25 |
A Continuous flag column needs to identify these discontinuous values:
Employee Number | Start Date | End Date | Continuous Flag | Explanation |
---|---|---|---|---|
001 | 1999-11-29 | 2000-03-12 | Y | 2000-03-13 is 1d after 2000-03-12 |
001 | 2000-03-13 | 2001-06-30 | Y | 2001-07-01 is 1d after 2001-06-30 |
001 | 2001-07-01 | 2002-01-01 | NaN | missing 2023-01-01 End Date row |
002 | 2000-09-18 | 2000-10-05 | Y | 2000-10-06 is 1d after 2000-10-05 |
002 | 2000-10-06 | 2001-06-30 | N | 2004-05-01 is not 1d after 2001-06-30 |
002 | 2004-05-01 | 2005-12-31 | N | 2008-01-01 is not 1d after 2005-12-31 |
002 | 2008-01-01 | 2008-11-25 | NaN | missing 2023-01-01 End Date row |
Then, for those rows that are 'N', a row needs to be inserted with the discontinuous dates to make them continuous in between rows. If there is no next row, use '2023-01-01' by default. Here is the expected output:
Employee Number | Start Date | End Date | Continuous Flag |
---|---|---|---|
001 | 1999-11-29 | 2000-03-12 | Y |
001 | 2000-03-13 | 2001-06-30 | Y |
001 | 2001-07-01 | 2002-01-01 | Y |
001 | 2002-01-02 | 2023-01-01 | NaN |
002 | 2000-09-18 | 2000-10-05 | Y |
002 | 2000-10-06 | 2001-06-30 | Y |
002 | 2001-07-01 | 2004-04-30 | Y |
002 | 2004-05-01 | 2005-12-31 | Y |
002 | 2006-01-01 | 2007-12-31 | Y |
002 | 2008-01-01 | 2008-11-25 | Y |
002 | 2008-11-26 | 2023-01-01 | NaN |
I tried idx for loop without success
Upvotes: 0
Views: 58
Reputation: 142208
Plan A: (Filling in gaps)
seq_...
, but messier in MySQL.)SELECT ... FROM that-table-of-dates LEFT JOIN your-table ON ...
As for filling in the gaps with values before (or after) the given hole. I don't understand the goals.
Plan B: (Simply discovering gaps)
Do a "self-join" of the table with itself. For this you must have consecutive ids. Since you don't have such, I am not sure what to do.
Then check whether the (end_date + INTERVAL 1 DAY)
of one row matches the start_date
of the 'next' row.
Plan C: (requires MySQL 8.0 or MariaDB 10.2)
Use LAG()
(or `LEAD() windowing functions to compare a value in one row to the previous (or next) row.
This may be the simplest way to set the "continuous flag".
Be sure to check for discontinuity in EmployeeId as well as INTERVAL 1 DAY
as mentioned above.
Upvotes: 1