Dazed and Confused
Dazed and Confused

Reputation: 81

Backfilling with following row details

I have a table like the 1st 3 columns, and need to add the fourth column:

Sequence ID Last Status  Current Status
1        1               New
2        1  New          Open    
3        1               Open
4        1               Open
5        1               Open
6        1  Open         Closed
7        1               Closed
8        1               Closed
9        1  Closed       
10       2               New
11       2  New          Open
12       2               Open
13       2  Open         Closed
14       2  Closed       
15       3               New
16       3  New          Open
etc.       

Basically it currently only shows the last status at the point the status was changed - I need to display the 'current' status at the time.

I'm new to SQL, and thought I could use LEAD, but unfortunately it's not supported in SAS.

Thanks!

Upvotes: 1

Views: 951

Answers (2)

Joe
Joe

Reputation: 63434

Here's the data step code that should do what you want. It doesn't do the final row correctly, since you don't have any further rows to indicate that it should be Open. If there is a data logic rule that can be applied, that should be easy to add.

It does require two sorts, which can be expensive, but likely the SQL Join will require similar maneuvers behind the scenes. If you have large amounts of data, I recommend testing both methods to see which is faster.

data have;
input Sequence ID LastStatus  $;
infile datalines missover;
datalines;
1        1               
2        1  New           
3        1               
4        1               
5        1               
6        1  Open         
7        1               
8        1               
9        1  Closed       
10       2               
11       2  New          
12       2               
13       2  Open         
14       2  Closed       
15       3               
16       3  New          
;;;;
run;

proc sort data=have;
by id  descending sequence;
run;

data want;
set have;
length CurrentStatus $8;
by id  descending sequence;      *so the last sequence for each ID comes first;
retain CurrentStatus;            *keep its value for each row;
if first.id then 
 call missing(CurrentStatus);    *reset for each ID;
output;                          *put out the current line;
if not missing(LastStatus) then  
   CurrentStatus = LastStatus;   *now update CurrentStatus if needed;
run;

proc sort data=want;             *resort in logical order;
by id sequence;
run;

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270873

The following SQL will work in most dialects, including SAS SQL:

select t.quence, t.id, t.lastStatus, tnext.lastStatus as CurrentStatus
from (select t.*,
             (select min(Sequence) from t t2 where t2.id = t.id and t2.sequence > t.sequence and t2.LastStatus is not null
             ) as NextId
      from t
     ) t left out join
     t tnext
     on t.sequence = tnext.sequence

It finds the next status record with the same id (if any) and then joins the results back in to get the status.

In SAS, I would be more inclined to do this use data step code. If the data is stored in a particular database, there may be simpler solutions specific to the database.

Upvotes: 2

Related Questions