CuteMeowMeow
CuteMeowMeow

Reputation: 91

How did SAS iteration works via variable _N_ and if else

I have encouterd this following code which works quiet not intuiative, as SAS should process step by step. However, following code seems to somehow jumps back to the previous code like a loop.

See the difference of flag and pageit in reset and reset_p2.

Generate dataset:

data new;
  do i=1 to 100;
    if i < 72 then type='first';
    else type='last';
    newval='newval'||left(i);
    output;
  end;
run;

The mystery code:

data reset;
  set new;
  by type;
  if _n_ eq 1 then flag=0;   
  else flag+1;
  if flag>=25 then do;
    pageit+1; 
    flag=0;
  end;
run;

an attempt to understand this code: we seperate the step

data reset_p1;
  set new;
  by type;
  if _n_ eq 1 then flag=0;   
  else flag+1;
run;

data reset_p2;
  set reset_p1;
  if flag>=25 then do;
    pageit+1; 
    flag=0;
  end;
run;

The pageit and flag column are different in reset and reset_p2.

pageit and flag in reset:

enter image description here

pageit and flag in reset_p2:

enter image description here

This means the code seems to not run step by step, but somehow jumps back to "if n eq 1" part. Anyone can explain why could this happen ? As this is really not intuitative.

Upvotes: 1

Views: 821

Answers (3)

Tom
Tom

Reputation: 51601

The original data step is making a NEW FLAG variable that does not exist in the input dataset. Because you are using the sum statement the variable's value is retained instead of being reset to missing when the implicit data step loop restarts.

But in the second data step the FLAG variable already exists on the input dataset. So its value is overwritten when the observation is read from the input dataset. So once you are past observation number 25 the (flag >= 25) condition is always true so FLAG gets reset from 25,26,27, etc to 0.

When using RETAIN make sure you are using a NEW variable that is not being read in from an input dataset.

If you already have a FLAG variable that has values 0,1,2,...24,25,26,... and you want to split it into bins of 25 values then just use arithmetic.

new_flag = mod(flag,25);
new_page = int(flag/25);

Upvotes: 3

gaurav p
gaurav p

Reputation: 11

if you modify your 'reset' code as below

data reset;
  set new;
  by type;
  if _n_ eq 1 then flag=0;   
  else flag+1;
  if flag>=25 then do;
    pageit+1; 
    flag1=0;
  end;
run;

you might get how resolution is done when flag>=25.

Upvotes: 1

johnjps111
johnjps111

Reputation: 1170

Explanation: the variable _N_ steps through the data for each row of data. You have 100 rows, so it will process 100 steps no matter what. In your code, you increment flag continually, but reset it to 0 every time it reaches 25; also, when it reaches 25, you increment pageit. And... that is exactly what your output looks like, so I'm not sure what your confusion is.

Upvotes: 1

Related Questions