Reputation: 27
I'm trying to calculate the months between two dates between default for an account, where the beginning of the next period is dependent on the exit date for the previous. I've been able to do this for instances where all default periods have been completed and the account is no longer in default. However, I cannot figure it out for instances where the account is currently still in default and has not exited. For example, an account enters default in October 2016 and exits in October 2019. The same account enters default in February 2020 and exits in July 2021 before entering again in October 2021 and remaining until today. I've been able to calculate the months between exiting and entry for the October 2019 exit and February 2020 entrance, but not between the July 2021 exit and October 2021 entrance.
The dataset only has entries when the account has been in default. There are also variables called default event and max default event that I've been using.
So far my data looks like:
account number | obs date | entry date| exit date | prev exit date | default event | max default event|
---------------------------------------------------------------------------------------------------------
11111111 | 30JUN2017 | 31OCT2016 | 31OCT2019 | . | 1 | 3 |
11111111 | 31JUL2017 | 31OCT2016 | 31OCT2019 | . | 1 | 3 |
.
.
.
11111111 | 29FEB2020 | 29FEB2020 | 31JUL2021 | 31OCT2019 | 2 | 3 |
11111111 | 31MAR2020 | 29FEB2020 | 31JUL2021 | 31OCT2019 | 2 | 3 |
.
.
.
11111111 | 31OCT2021 | 31OCT2021 | . | . | 3 | 3 |
Previously, the data only consisted of the account number, the obs date, the default event, and the entry date. There is also a macro variable called end date equal to the end of the observation period that I'm looking at. I created the exit date and prev entry date variables like so:
data exit_dates;
set defaults;
by accnum default_event;
format exit_date date9.;
if last.default_event then
exit_flag = "Y";
else exit_flag = "N";
if exit_flag = "Y" then
exit_date = obs_date;
if default_flag = &end_date. then do;
exit_flag = "N";
exit_date = .;
end;
if exit_date ne .;
run;
I could then calculate the months between periods for those that are not still in default:
data months_between;
set exit_dates;
by accnum obs_date;
format prev_exit_date date9.;
prev_exit_date = lag(exit_date);
months_between = intck('month',prev_exit_date,entry_date);
if first.accnum then do;
prev_exit_date = .;
months_between = .;
end;
run;
I've been trying to use lags to isolate the final exit date so that it can then be placed as the prev exit date for the accounts that are still in default. Mainly like this:
if default_Event = max_default_Event and default_event ne lag(default_event) then
prev_exit_date = lag(exit_date)
However, this is causing some very strange results. I've done some research and it seems like it may be due to using the lag function in an if statement, but I haven't been able to figure out another way to accomplish this without using lags. I've also tried to sort the dataset and drop the duplicates based on account number and default event, hopefully keeping the one instance that would populate with the correct prev exit date, but this has also not worked and led mainly to blanks that I don't understand why they don't populate.
Upvotes: 0
Views: 38
Reputation: 51566
The LAG() function needs to be called on every value since its source of values to return are the values it receives when it is called.
I suspect you just want to retain the previous exit date.
It is not clear what you want as output but here is how you can use a retained variable.
data exit_dates;
set defaults;
by accnum default_event;
retain last_exit_date ;
format last_exit_date date9.;
if first.accnum then last_exit_date=.;
if not missing(exit_date) then last_exit_date=exit_date;
run;
If the goal is to remember the last exit date PRIOR to the current "event" then perhaps you only want to change the value at the start or end of a set of observations for the same event.
Here is a method to update the value AFTER you have written the current observation to the new dataset.
data exit_dates;
set defaults;
by accnum default_event;
retain prior_exit_date ;
format prior_exit_date date9.;
* any code here that does calculations such as time since prior_exit_date ;
output;
if last.accnum then prior_exit_date=.;
if last.default_event and not missing(exit_date) then prior_exit_date=exit_date;
run;
Note that it does no good to RETAIN a variable that is coming in from a source dataset. First such variables are already being retained. But the main problem is that any retained value will be immediately overwritten by the value read from the dataset when the SET statement executes.
Upvotes: 1