Reputation: 23
Is there a data driven way to dynamically change the variable names from one name to another where each dataset may or may not have the original variable? I want to iterate through each dataset to change the old variable (if it exists since some datasets have more variables than others) and assign it as a single new standardized variable name. How can this be accomplished in a data driven manner?
Should this be:
%let statement
array with do loop
Read in excel file with variable lists and rename statement old=new
macro
For example,
Dataset 1
ProvID NPI Txnmy_cd Txnmy_cd2 Spec_cd1 Spec_cd2 Spec_cd3 Spec_cd4
Dataset 2
ProvID NPI Txnmy_cd1 Txnmy_cd2 Txnmy_cd3 Txnmy_cd4 Spec_cd Spec_cd1
Dataset 3
ProvID NPI Txnmy_cd Txnmy_cd2 Txnmy_cd3 Txnmy_cd4 Spec_cd1 Spec_cd2
and so on...
Final Result:
Dataset 1
ProvID NPI Taxonomy Taxonomy2 Taxonomy3 ProvSpec1 ProvSpec2 ProvSpec3 ProvSpec4
Dataset 2
ProvID NPI Taxonomy Taxonomy2 Taxonomy3 Taxonomy4 ProvSpec1 ProvSpec2
Dataset 3
ProvID NPI Taxonomy Taxonomy2 Taxonomy3 Taxonomy4 ProvSpec1 ProvSpec2
and so on...
Upvotes: 0
Views: 1130
Reputation: 27508
The most deterministic way is to maintain a mapping table of name transformations.
Example (mapping based on final result shown in question)
xl_name sas_name based on
------- -------- -----------
Txnmy_cd Taxonomy per 1 2 3
Txnmy_cd1 Taxonomy per 2
Txnmy_cd2 Taxonomy2 per 1 2 3
Txnmy_cd3 Taxonomy3 per 2 3
Txnmy_cd4 Taxonomy4 per 2 3
Spec_cd ProvSpec1 per 2
Spec_cd1 ProvSpec1 per 1 3
Spec_cd1 ProvSpec2 per 2 # inconsistent
Spec_cd2 ProvSpec2 per 1 3
Spec_cd3 ProvSpec3 per 1
Spec_cd4 ProvSpec4 per 1
First, the result examples for sas_name
targets are inconsistent
Taxonomy
ProcSpec1
Second, the real world intrudes on a pristine mapping table
spec_cd1
to ProvSpec2
is not consistent with the other mappings shown for data 1 and data 3.Real question?
Q: Should the renames be based on variable name prefixes, and 1,2,3,4 suffix be based on order of appearance (repetition count) of suffix ?
Suppose that is the case. Use a prefix mapping table (or format) to locate the source variables that need to be renamed with an in order numeric suffix.
Example:
proc format;
value $prefix_map
'Txnmy_cd' = ':Taxonomy' /* : means 1st does not get 1 suffix */
'Spec_cd' = '#ProvSpec' /* # means 1st does get 1 suffix */
other = ' '
;
%macro rename_per_prefix(data=);
%local renames lib mem;
%let syslast = &data;
%let lib = %scan(&syslast,1,.);
%let mem = %scan(&syslast,2,.);
proc contents noprint data=&data out=_names_(keep=name varnum);
data _names_;
set _names_;
length sas_prefix $25;
do len = length(name) to 1 by -1 until(not missing(sas_prefix));
sas_prefix = put(substr(name,1,len),$prefix_map.);
end;
if not missing(sas_prefix);
proc sort
data=_names_; by sas_prefix varnum;
data _null_;
length renames $32767; retain renames;
do until (no_more_candidates);
do index = 1 by 1 until (last.sas_prefix);
set _names_ end=no_more_candidates;
by sas_prefix;
if sas_prefix =: ':' then
if index = 1 then suffix = ''; else suffix = cats(index);
else
if sas_prefix =: '#' then
suffix = cats(index);
newname = cats(substr(sas_prefix,2),suffix);
renames = catx(' ', renames, catx('=', name, newname));
end;
end;
call symput('renames', strip(renames));
run;
%if %length(%superq(renames)) %then %do;
proc datasets nolist lib=&lib;
modify &mem;
rename %superq(renames);
%end;
proc delete data=_names_;
run;
%mend;
* mimic data import;
data one;
call missing(ProvID,NPI,Txnmy_cd,Txnmy_cd2,Spec_cd1,Spec_cd2,Spec_cd3,Spec_cd4);
data two;
call missing(ProvID,NPI,Txnmy_cd1,Txnmy_cd2,Txnmy_cd3,Txnmy_cd4,Spec_cd,Spec_cd1);
data three;
call missing(ProvID,NPI,Txnmy_cd,Txnmy_cd2,Txnmy_cd3,Txnmy_cd4,Spec_cd1,Spec_cd2);
run;
* perform renames;
%rename_per_prefix(data=one)
%rename_per_prefix(data=two)
%rename_per_prefix(data=three)
What if
Suppose there were consistent transformations across all Excel sources.
After importing a data set data from Excel, you can use Proc CONTENTS
and the mapping table to programmatically generate the SAS code that performs the transforming renames.
The mapping table and codegen become more complicated if you allow the import process to bring in variable names that require name literals to specify. (I.e. import brings in a variable named 'Provided Specimen #1'n
Example:
data name_map;
length xl_name sas_name $32;
input
xl_name sas_name; datalines;
Txnmy_cd Taxonomy
Txnmy_cd1 Taxonomy
Txnmy_cd2 Taxonomy2
Txnmy_cd3 Taxonomy3
Txnmy_cd4 Taxonomy4
Spec_cd ProvSpec1
Spec_cd1 ProvSpec1
Spec_cd2 ProvSpec2
Spec_cd3 ProvSpec3
Spec_cd4 ProvSpec4
;
%macro rename_per_map(data=, map=name_map);
%local renames lib mem;
%let syslast = &data;
%let lib = %scan(&syslast,1,.);
%let mem = %scan(&syslast,2,.);
proc contents noprint data=&data out=_names_(keep=name);
proc sql noprint;
select catx('=',xl_name,sas_name)
into :renames separated by ' '
from _names_ source
join &map map
on source.name = map.xl_name
;
%if %length(%superq(renames)) %then %do;
proc datasets nolist lib=&lib;
modify &mem;
rename %superq(renames);
%end;
proc delete data=_names_;
run;
%mend;
* mimic data import;
data one;
call missing(ProvID,NPI,Txnmy_cd,Txnmy_cd2,Spec_cd1,Spec_cd2,Spec_cd3,Spec_cd4);
data two; /* tweaked to be consistent */
call missing(ProvID,NPI,Txnmy_cd1,Txnmy_cd2,Txnmy_cd3,Txnmy_cd4,Spec_cd,Spec_cd2);
data three;
call missing(ProvID,NPI,Txnmy_cd,Txnmy_cd2,Txnmy_cd3,Txnmy_cd4,Spec_cd1,Spec_cd2);
run;
* perform renames;
%rename_per_map(data=one)
%rename_per_map(data=two)
%rename_per_map(data=three)
Upvotes: 1