Binnnnn5
Binnnnn5

Reputation: 187

How to recognise a particular sequence in a dataset and mark it?

How to recognize the first "1,0" sequence in column "Flag" from each group and mark a "1" just like it in column "Flag2"?

ID  Flag    Flag2
1   1   
1   1   1
1   0   
1   1   
1   0   
1   0   
2   1   
2   1   
2   1   
2   1   1
2   0   
2   0   
3   0   
3   0   
3   0   
3   0   
4   1   
4   1   1
4   0   
4   1

Upvotes: 0

Views: 47

Answers (1)

Richard
Richard

Reputation: 27508

The problem requires using a 'lead' concept (value from next row) similar to the lag concept provided by the lag function. There is no built in lead function so you need to be creative.

  • Merge the data to itself, without a by statement, where the second version is:
    • Offset by one row by the firstobs data set option
    • Renames the variables so the lead state can be established with an if
  • A retained variable tracks if the 1,0 transition has been observed within the group.

Sample code:

data have;input
ID  Flag; datalines;
1   1
1   1
1   0
1   1
1   0
1   0
2   1
2   1
2   1
2   1
2   0
2   0
3   0
3   0
3   0
3   0
4   1
4   1
4   0
4   1
run;

data want;
  merge 
    have
    have(firstobs=2 rename=(id=lead_id flag=lead_flag))
;

  retain flagged_id;

  if (id=lead_id)                  /* lead is in same group */
  and (flag=1) and (lead_flag=0)   /* transition identified */
  and (flagged_id ne id) then      /* first such transition for group */
  do;
    flag2=1;                       /* flag the lead transition */
    flagged_id = id;               /* track id where transition last flagged */
  end;

  drop lead_: flagged:;
run;

Upvotes: 1

Related Questions