a_swoosh
a_swoosh

Reputation: 23

Want to dynamically change variable names in SAS

Is there a data driven way to dynamically change the variable names from one name to another where each dataset may or may not have the original variable? I want to iterate through each dataset to change the old variable (if it exists since some datasets have more variables than others) and assign it as a single new standardized variable name. How can this be accomplished in a data driven manner?

Should this be:

%let statement
array with do loop
Read in excel file with variable lists and rename statement old=new
macro

For example,

Dataset 1

ProvID NPI Txnmy_cd Txnmy_cd2 Spec_cd1 Spec_cd2 Spec_cd3 Spec_cd4

Dataset 2

ProvID NPI Txnmy_cd1 Txnmy_cd2 Txnmy_cd3 Txnmy_cd4 Spec_cd Spec_cd1

Dataset 3

ProvID NPI Txnmy_cd Txnmy_cd2 Txnmy_cd3 Txnmy_cd4 Spec_cd1 Spec_cd2

and so on...

Final Result:

Dataset 1

ProvID NPI Taxonomy Taxonomy2 Taxonomy3 ProvSpec1 ProvSpec2 ProvSpec3 ProvSpec4

Dataset 2

ProvID NPI Taxonomy Taxonomy2 Taxonomy3 Taxonomy4 ProvSpec1 ProvSpec2 

Dataset 3

ProvID NPI Taxonomy Taxonomy2 Taxonomy3 Taxonomy4 ProvSpec1 ProvSpec2

and so on...

Upvotes: 0

Views: 1130

Answers (1)

Richard
Richard

Reputation: 27508

The most deterministic way is to maintain a mapping table of name transformations.

Example (mapping based on final result shown in question)

xl_name   sas_name        based on
-------   --------        -----------
Txnmy_cd  Taxonomy        per 1 2 3
Txnmy_cd1 Taxonomy        per   2
Txnmy_cd2 Taxonomy2       per 1 2 3
Txnmy_cd3 Taxonomy3       per   2 3
Txnmy_cd4 Taxonomy4       per   2 3
Spec_cd   ProvSpec1       per   2
Spec_cd1  ProvSpec1       per 1   3
Spec_cd1  ProvSpec2       per   2     # inconsistent
Spec_cd2  ProvSpec2       per 1   3
Spec_cd3  ProvSpec3       per 1
Spec_cd4  ProvSpec4       per 1

First, the result examples for sas_name targets are inconsistent

  • for taxonomy the first rename is to Taxonomy
  • for provspec the first rename is to ProcSpec1

Second, the real world intrudes on a pristine mapping table

  • the final result data 2 transform of spec_cd1 to ProvSpec2 is not consistent with the other mappings shown for data 1 and data 3.

Real question?

Q: Should the renames be based on variable name prefixes, and 1,2,3,4 suffix be based on order of appearance (repetition count) of suffix ?

Suppose that is the case. Use a prefix mapping table (or format) to locate the source variables that need to be renamed with an in order numeric suffix.

Example:

proc format;
  value $prefix_map
  'Txnmy_cd' = ':Taxonomy'    /* : means 1st does not get 1 suffix */
  'Spec_cd'  = '#ProvSpec'    /* # means 1st does     get 1 suffix */
  other = ' '
  ;

%macro rename_per_prefix(data=);
  %local renames lib mem;

  %let syslast = &data;
  %let lib = %scan(&syslast,1,.);
  %let mem = %scan(&syslast,2,.);

  proc contents noprint data=&data out=_names_(keep=name varnum);
  data _names_;
    set _names_;

    length sas_prefix $25;
    do len = length(name) to 1 by -1 until(not missing(sas_prefix));
      sas_prefix = put(substr(name,1,len),$prefix_map.);
    end;

    if not missing(sas_prefix);
  proc sort 
    data=_names_; by sas_prefix varnum;
  data _null_;
    length renames $32767; retain renames;
    do until (no_more_candidates);
      do index = 1 by 1 until (last.sas_prefix);
        set _names_ end=no_more_candidates;
        by sas_prefix;

        if sas_prefix =: ':' then
          if index = 1 then suffix = ''; else suffix = cats(index);
        else
        if sas_prefix =: '#' then 
          suffix = cats(index);

        newname = cats(substr(sas_prefix,2),suffix);

        renames = catx(' ', renames, catx('=', name, newname));        
      end;
    end;

    call symput('renames', strip(renames));
  run;    

  %if %length(%superq(renames)) %then %do;
  proc datasets nolist lib=&lib;
    modify &mem;
    rename %superq(renames);
  %end;

  proc delete data=_names_;
  run;
%mend;

* mimic data import;
data one;
call missing(ProvID,NPI,Txnmy_cd,Txnmy_cd2,Spec_cd1,Spec_cd2,Spec_cd3,Spec_cd4);
data two;
call missing(ProvID,NPI,Txnmy_cd1,Txnmy_cd2,Txnmy_cd3,Txnmy_cd4,Spec_cd,Spec_cd1);
data three;
call missing(ProvID,NPI,Txnmy_cd,Txnmy_cd2,Txnmy_cd3,Txnmy_cd4,Spec_cd1,Spec_cd2);
run;

* perform renames;

%rename_per_prefix(data=one)
%rename_per_prefix(data=two)
%rename_per_prefix(data=three)

What if

Suppose there were consistent transformations across all Excel sources.

After importing a data set data from Excel, you can use Proc CONTENTS and the mapping table to programmatically generate the SAS code that performs the transforming renames.

The mapping table and codegen become more complicated if you allow the import process to bring in variable names that require name literals to specify. (I.e. import brings in a variable named 'Provided Specimen #1'n

Example:

data name_map;
length xl_name sas_name $32;
input 
xl_name  sas_name; datalines;
Txnmy_cd  Taxonomy
Txnmy_cd1 Taxonomy
Txnmy_cd2 Taxonomy2
Txnmy_cd3 Taxonomy3
Txnmy_cd4 Taxonomy4
Spec_cd   ProvSpec1
Spec_cd1  ProvSpec1
Spec_cd2  ProvSpec2 
Spec_cd3  ProvSpec3
Spec_cd4  ProvSpec4
;

%macro rename_per_map(data=, map=name_map);
  %local renames lib mem;

  %let syslast = &data;
  %let lib = %scan(&syslast,1,.);
  %let mem = %scan(&syslast,2,.);

  proc contents noprint data=&data out=_names_(keep=name);
  proc sql noprint;
    select catx('=',xl_name,sas_name) 
    into :renames separated by ' '
    from _names_ source
    join &map    map
      on source.name = map.xl_name
    ;
  %if %length(%superq(renames)) %then %do;
  proc datasets nolist lib=&lib;
    modify &mem;
    rename %superq(renames);
  %end;
  proc delete data=_names_;
  run;
%mend;

* mimic data import;
data one;
call missing(ProvID,NPI,Txnmy_cd,Txnmy_cd2,Spec_cd1,Spec_cd2,Spec_cd3,Spec_cd4);
data two; /* tweaked to be consistent */
call missing(ProvID,NPI,Txnmy_cd1,Txnmy_cd2,Txnmy_cd3,Txnmy_cd4,Spec_cd,Spec_cd2);
data three;
call missing(ProvID,NPI,Txnmy_cd,Txnmy_cd2,Txnmy_cd3,Txnmy_cd4,Spec_cd1,Spec_cd2);
run;

* perform renames;

%rename_per_map(data=one)
%rename_per_map(data=two)
%rename_per_map(data=three)

Upvotes: 1

Related Questions