Colin McKay
Colin McKay

Reputation: 1

How to add sequential ID based on condition SAS

I have the dataset with Time and Interval variable as below. I would like to add a sequential ID (Indicator) with SAS based on a condition that Interval is greater than 0.1, as follows:

Time Interval Indicator
11:40:38 0.05 .
11:40:41 0.05 .
11:40:44 0.05 .
11:40:47 0.05 .
11:40:50 0.05 .
11:42:50 2 1
11:42:53 0.05 2
11:42:56 0.05 3
11:42:59 0.05 4
11:43:02 0.05 5
11:43:05 0.05 6
11:43:08 0.05 7
11:43:18 0.16667 1
11:43:21 0.05 2
11:43:24 0.05 3
11:43:27 0.05 4
11:43:30 0.05 5
11:43:33 0.05 6

If I use the code

`data out1; set out ;
 by Time;
 retain indicator;
 if Interval > 0.1 then indicator=1;
 indicator+1;
 run;`

Indicator is not missing for the first five observations. I would like that it starts counting only when the condition is met (Interval > 0.1).

Thanks!

Upvotes: 0

Views: 335

Answers (3)

Richard
Richard

Reputation: 27508

For yucks, here is a one-liner of @WhyMath logic.

data want;
  set have;
  retain seq;
  seq = ifn(interval > 0.1, 1, ifn(seq, sum(seq,1), seq));
run;

Upvotes: 1

Tom
Tom

Reputation: 51621

If you want to retain INDICATOR it cannot be on the input dataset, otherwise the SET statement will overwrite the retained value with the value read from the existing dataset.

If you want INDICATOR to start as missing when using the SUM statement then you need to explicitly say so in the RETAIN statement. Otherwise the SUM statement will cause the variable to be initialized to zero.

If looks like you only want to increment when the new variable has already been assigned at least one value.

data want;
  set have;
  retain new .;
  if interval>0.1 then new=1; 
  else if new > 0 then new+1;
run;

Results:

OBS        Time    Interval    Indicator    new

  1    11:40:38     0.05000        .         .
  2    11:40:41     0.05000        .         .
  3    11:40:44     0.05000        .         .
  4    11:40:47     0.05000        .         .
  5    11:40:50     0.05000        .         .
  6    11:42:50     2.00000        1         1
  7    11:42:53     0.05000        2         2
  8    11:42:56     0.05000        3         3
  9    11:42:59     0.05000        4         4
 10    11:43:02     0.05000        5         5
 11    11:43:05     0.05000        6         6
 12    11:43:08     0.05000        7         7
 13    11:43:18     0.16667        1         1
 14    11:43:21     0.05000        2         2
 15    11:43:24     0.05000        3         3
 16    11:43:27     0.05000        4         4
 17    11:43:30     0.05000        5         5
 18    11:43:33     0.05000        6         6

Upvotes: 0

whymath
whymath

Reputation: 1394

You can do it with a little modification:

data out1;
  set out ;
  retain indicator;
  if Interval>0.1 then indicator=0;
  if indicator^=. then indicator+1;
run; 

The summuation will start after the condition Interval>0.1 has been met, because indicator is equal to missing value before that, so indicator+1 would not be calculated.
And you need to initial indicator as 0, not 1. If indicator is equal to 0, indicator^=. will be satisfied and indicator+1 will be calculated.

Upvotes: 1

Related Questions