Reputation: 686
I came across a rather weird situation where I concatenated two datasets and got really unexpected results. Here's the setup:
data a;
do i=1 to 100;
output;
end;
run;
data b;
do j=5 to 79;
output;
end;
run;
data c;
set a b;
if j=. then j=i;
run;
Can someone explain to me why the j takes on the value of "1" for the first 100 observations? It looks like j is being retained, but even weirder, if I change the conditional to:
data c;
set a b;
if i=. then i=j;
run;
Then a "retain" statement" would imply that after the 100th observation, you should see 100 retained down, but it isn't! It's 5! What is going on?
Thanks,
Upvotes: 1
Views: 59
Reputation: 51601
All variables that are sourced from input datasets are "retained". You just don't normally notice it because the values are overwritten when the SET statement executes. In your example dataset B
does not contribute to the first few observations so the value you put into the variable J
on the first iteration of the data step is not overwritten until you start reading from dataset B
.
The other wrinkle is that when you do make the transition from reading dataset A
to reading dateset B
then the variables that are sourced from A
are cleared.
Adding some PUT statements to your data step so you can watch what is happening will make it clearer.
data both;
put / 'TOP :' (_n_ i j) (=);
set a(obs=3) b(obs=3) ;
put 'MIDDLE:' (_n_ i j) (=);
if missing(j) then j=i ;
put 'BOTTOM:' (_n_ i j) (=);
run;
Results
TOP :_N_=1 i=. j=.
MIDDLE:_N_=1 i=1 j=.
BOTTOM:_N_=1 i=1 j=1
TOP :_N_=2 i=1 j=1
MIDDLE:_N_=2 i=2 j=1
BOTTOM:_N_=2 i=2 j=1
TOP :_N_=3 i=2 j=1
MIDDLE:_N_=3 i=3 j=1
BOTTOM:_N_=3 i=3 j=1
TOP :_N_=4 i=3 j=1
MIDDLE:_N_=4 i=. j=5
BOTTOM:_N_=4 i=. j=5
TOP :_N_=5 i=. j=5
MIDDLE:_N_=5 i=. j=6
BOTTOM:_N_=5 i=. j=6
TOP :_N_=6 i=. j=6
MIDDLE:_N_=6 i=. j=7
BOTTOM:_N_=6 i=. j=7
TOP :_N_=7 i=. j=7
Upvotes: 3