Reputation: 1396
Good day,
I have old data with different format and updating it with new. Idea is that variable WantedName
had been written with underscore Wanted_Name
whereas new data is without this. No big deal I though.
EDIT: WantedName
and Wanted_Name
are SAS dates.
Original data:
year Wanted_Name
2013 1234
2013 4321
2013 3241
and from year 2015 forward:
year WantedName
2015 5678
2015 8765
....
I tried the logic:
%macro macro_env;
data _null_;
call symput ("curr_year", year(date()) );
run;
data long_data;
set
%do year=2013 %to &curr_year.;
data.history&year.
%end;
;
WantedName=COALESCE(WantedName, Wanted_Name);
run;
%mend macro_env;
Nope. Did not work. For some reason it took the first value of Wanted_Name
and pasted that over the whole data range except for the new data:
year Wanted_Name WantedName
2013 1234 1234
2013 4321 1234
2013 3241 1234
....
2015 . 5678
2015 . 8765
Now I managed to fix the issue by removing the coalesce
function and adding a additional data statement:
data long_data;
set
%do year=2013 %to &curr_year.;
data.history&year.
%end;
;
run;
data long_data;
set long_data;
WantedName=COALESCE(WantedName, Wanted_Name);
run;
Question: What happens or why isn't the original macro_env working?
I thought datasets in set
were first loaded and then function applied. (Which is working in the latter data statement. Apparently not... Maybe?
Upvotes: 3
Views: 1581
Reputation: 63424
I think a better way to do this is to handle it with rename
in the set statement.
data long_data;
set
%do year=2013 %to &curr_year.;
data.history&year.
%if &year lt 2015 %then %do;
(rename=wanted_name=wantedname)
%end;
%end;
;
run;
Cheaper (rename is cheaper than coalesce
function) and simpler.
Upvotes: 2
Reputation: 51566
As answer form @Allan Bowe explains the issue is that variables sourced for datasets are automatically retained. So you could add code to clear them.
data long_data;
set data.history20: ;
WantedName=COALESCE(WantedName, Wanted_Name);
output;
call missing(wantedname,wanted_name);
run;
Or make a NEW variable.
data long_data;
set data.history20: ;
new_WantedName=COALESCE(WantedName, Wanted_Name);
format new_WantedName date9. ;
drop WantedName Wanted_Name ;
rename new_WantedName = WantedName ;
run;
Upvotes: 4
Reputation: 12691
The reason is that the pdv implicitly retains variables that have been "set", and wantedName
is not being flushed from the pdv when the earlier datasets (pre 2015) are being loaded.
The compiler made a space for it (by reading all the datasets in the set statement) but the set command does not (initially) replace the value of wantedName
.
So - when the first observation is read, wantedname
is missing and the first value of wanted_name
is applied.
WantedName=COALESCE(., 1234); /* obs 1 */
In the second iteration, the value of wantedname
has been retained, and so is used for every subsequent iteration until a dataset is read in that contains wantedname
.
WantedName=COALESCE(1234, 4321); /* obs 2 */
The second example uses a dataset from the beginning that has the wantedname
variable.
Great question!
Upvotes: 2