min wanh
min wanh

Reputation: 3

SAS - Loop for variable and compare dates from different entries?

I'm very new to SAS, trying to learn everything I need for my analytical task. The task I have now is to create a flag for the ongoing application. I think it might be easier to show it in a table, just to illustrate my problem:enter image description here

[Update 2017.10.27] data sample in code, big thanks to Richard :)

   data sample;
    input PeopleID ApplicationID Applied_date yymmdd10. Decision_date yymmdd10. Ongoing_flag_wanted;
format Applied_date Decision_date yymmdd10.;
datalines;
1 6 2017.10.1 2017.10.1 1
1 5 2017.10.1 2017.10.4 0
1 3 2017.9.28 2017.9.29 1
1 2 2017.9.26 2017.9.26 1
1 1 2017.9.25 2017.9.30 0
2 8 2017.10.7 2017.10.7 1
2 7 2017.10.2 .         0
3 4 2017.9.30 2017.10.3 0
run;

In the system, people apply for the service. When a person does that, he gets a PeopleID, which does not change when the person applies again. And also each application gets an applicationID, which is unique and later applications have larger applicationID. What I want is to create an Ongoing flag for each application. The propose is to show that: by the time this application came in, the same person has or does not have an ongoing application (application which has not received a decision). See some examples from the table above:

Person#2 has two applications #8 and #7, by the time he applied #8, #7 has not been decided, therefore #8 should get ongoing flag.

Person#1 applied multiple times. Application #3 and #2 have ongoing application due to App#1. Application #6 and #5 came in at the same date, but according to application ID, we can tell that #6 came in later than #5, and as #5 have not been decided by then, #6 gets ongoing flag.

As you might notice, application with a positive ongoing flag always receives decisions on the same date as it came in. That is because applications with ongoing cases are automatically declined. However, I cannot use this as an indicator: there are many other reasons that trigger an automatic decline.

The ongoing_flag is what I want to create in my dataset. I have tried to sort by 1.peopleID, 2.descending applicationID, 3. descending applied_date, so my entire dataset looks like the small example table above. But then I don't know how to make SAS compare within the same variable (peopleID) but different lines (applicationID) and columns (compare Applied_date with Decision_date). I want to compare, for each person, every application's applied_date with all the previous applications' decision_date, such that I can tell by the time this application came in, whether or not there is an ongoing application from previously in the system.

I know I used too many words to explain my problem. For those who read through, thank you for reading! For those who have any idea on what might be a good approach, please leave your comments! Millions of thanks!

Upvotes: 0

Views: 1040

Answers (1)

Richard
Richard

Reputation: 27536

Min:

For problems of this type you want to mentally break the data structure into different parts.

BY GROUP The variables whose unique combination defines the group. There are one or more rows in a group. Let's call them items.

GROUP DETAILS Variables that are observational in nature. They may be numbers such as temperature, weight or dollars, or, characters or strings that represent some state being tracked. The details (at the state you are working) themselves might be aggregates for a deeper level of detail.

GOAL Compute additional variables that further elucidate an aspect of the details over the group. For numeric the goal might be statistical such as MIN, MAX, MEAN, MEDIAN, RANGE, etc. Or it might be identificational such as which ID had highest $, or which name was longest, or any other business rule.

Your specific problem is one of determining claim activity on a given date. I think of it as a coverage type of problem because the dates in question cover a range. The BY GROUP is person and an 'Activity' date.

Here is one data-centric approach. The original data is expanded to have one row per date from applied to decided. Then simple BY group processing and the automatic first. are used to determine if an application is during one as yet undecided.

data have;
input PeopleID ApplicationID Applied_date yymmdd10. Decision_date yymmdd10. Ongoing_flag_wanted;
format Applied_date Decision_date yymmdd10.;
datalines;
1 6 2017.10.1 2017.10.1 1
1 5 2017.10.1 2017.10.4 0
1 3 2017.9.28 2017.9.29 1
1 2 2017.9.26 2017.9.26 1
1 1 2017.9.25 2017.9.30 0
2 8 2017.10.7 2017.10.7 1
2 7 2017.10.2 .         0
3 4 2017.9.30 2017.10.3 0
run;

data coverage;
  do _n_ = 1 by 1 until (last.PeopleID);
    set have;
    by PeopleID;
    if Decision_date > Max_date then Max_date = Decision_date;
  end;

  put 'NOTE: ' PeopleID= Max_date= yymmdd10.;

  do _n_ = 1 to _n_;
    set have;
    do Activity_date = Applied_date to ifn(missing(Decision_date),Max_date,Decision_date);
      if missing(Decision_date) then Decision_date = Max_date;
      output;
    end;
  end;

  keep PeopleID ApplicationID Applied_date Decision_date Activity_date;
  format Activity_date yymmdd10.;
run;

proc sort data=coverage;
  by PeopleID Activity_date ApplicationID ;
run;

data overlap;
  set coverage;
  by PeopleID Activity_date;

  Ongoing_flag = not (first.Activity_date);

  if Activity_date = Applied_date then
    output;
run;

proc sort data=overlap;
  by PeopleID descending ApplicationID ;
run;

Other approaches could involve arrays, hashes, or SQL. SQL is very different from DATA Step code and some consider it to be more clear.

proc sql;
  create table want as
  select
    PeopleID, ApplicationID, Applied_date, Decision_date
  , case
      when exists (
        select * from have as inner 
        where inner.PeopleID = outer.PeopleID
          and inner.ApplicationID < outer.ApplicationID
          and 
            case
              when inner.Decision_date is null and outer.Decision_date is null then 1
              when inner.Decision_date is null then 1
              when outer.Decision_date is null then 0
              else outer.Decision_date < inner.Decision_date
            end
      )
      then 1
      else 0
    end as Ongoing_flag
  from have as outer
  ;

Upvotes: 2

Related Questions