Complex data restructure issue in SAS

Question

I have a data set of card history as below. For each customer, they may have applied for one or multiple cards on the same day. However, due to various reasons, their cards get replaced. Card issue date is when a card is issued. New Card ID is the ID for the replaced card. For example, for customer A, his card was firstly issue on 2/1/2017 and the card ID is 1234. 3 days later, he lost his card and a new card (1235) is issued on 5/2/2017.

Customer ID First Issue Date    Card Issue Date Card ID New Card ID
A   2/1/2017    2/1/2017    1234    1235
A   2/1/2017    5/2/2017    1235    　
B   5/2/2017    5/2/2017    1245    1248
B   5/2/2017    5/2/2017    1236    1249
B   5/2/2017    10/3/2017   1248    1250
B   5/2/2017    5/3/2017    1249    1251
B   5/2/2017    10/4/2017   1250    　
B   5/2/2017    5/4/2017    1251

What I want is to group the original card and all the replacement together. For example, customer B applied for two cards on 5/2/217. Card ID 1245, 1248 and 1250 are in the same group (Seq No 1) and Card ID 1236, 1249 and 1251 are in the same group (Seq No 2).

Customer ID Open Date   Card Issue Date Card ID Seq No
A   2/1/2017    2/1/2017    1234    1
A   2/1/2017    5/2/2017    1235    1
B   5/2/2017    5/2/2017    1245    1
B   5/2/2017    10/3/2017   1248    1
B   5/2/2017    10/4/2017   1250    1
B   5/2/2017    5/2/2017    1236    2
B   5/2/2017    5/3/2017    1249    2
B   5/2/2017    5/4/2017    1251    2

Please help me with this data transformation.

Here is the data step for the input file

data test;
infile datalines dsd truncover ;
input Customer:$1.
First_Issue_Date: ddmmyy10.
Card_Issue_Date: ddmmyy10.
Card_ID: $4.
New_Card_ID: $4. ;
format First_Issue_Date ddmmyy10. Card_Issue_Date ddmmyy10.;
datalines;
A,02/01/2017,02/01/2017,1234,1235,
A,02/01/2017,05/02/2017,1235,,
B,05/02/2017,05/02/2017,1245,1248,
B,05/02/2017,05/02/2017,1236,1249,
B,05/02/2017,10/03/2017,1248,1250,
B,05/02/2017,05/03/2017,1249,1251,
B,05/02/2017,10/04/2017,1250,,
B,05/02/2017,05/04/2017,1251,,
;

Richard · Accepted Answer

The DATA Step hash object is very effective for traversing paths in identity tracked data. Presuming every Card_ID is unique over all customers, and each New_Card_ID value has a corresponding Card_ID value in the data set, then this code will find unique path ids amongst the myriad of reissues.

data paths(keep=card_id path_id);
  if 0 then set have; * prep pdv;

  call missing (Path_ID);

  * for tracking the tip of the card_id trail;
  DECLARE HASH currentCard(hashexp: 9);
  currentCard.defineKey ('Card_ID');
  currentCard.defineData ('Card_ID', 'Path_ID');
  currentCard.defineDone();

  * for tracking everything up to the tip (segment);
  DECLARE HASH replacedCard(hashexp:10);
  replacedCard.defineKey ('New_Card_ID');     
  replacedCard.defineData('Card_ID');
  replacedCard.defineDone();

  * fill the two hashes;
  do until (lastrow);
    set have (keep=Card_ID New_Card_ID) end=lastrow;

    if missing(New_Card_ID) then 
      Path_ID + 1;

    if missing(New_Card_ID)
      then currentCard.add();
      else replacedCard.add(); 
  end;

  * for each tip of a path output the tip and all its segments;
  declare hiter tipIter('currentCard');
  do while (tipIter.next() = 0);
    output; * tip;

    do while (replacedCard.find(key:Card_ID) = 0);
      output; * segment;
    end;
  end;

  stop;
run;

If you really need Seq = 1..N within Customer you will have to do additional sorting and merging.

My NESUG 2009 paper "Using HASH to find a sum over a transactional path" has a similar discussion about linked transactions.

Complex data restructure issue in SAS

Answers (2)

Related Questions