Reputation: 1031
HAVE
is a wide dataset with names stored in the variables name1-name250
. Here are the first two obs and several vars:
episode name1 name2 name3 name4 name5 ...
121 DETWEILER.TJ.M BLUMBERG.MIKEY GRISWOLD.GUS.N
451 BOB.KING KID.HUSTLER FINSTER.MS PRICKLEY.PETEY GRISWOLD.GUS
...
Some of the names need to be corrected. The corrections are stored in the dataset FIXES
:
goodname badname
DETWEILER.TJ DETWEILER.TJ.M
GRISWOLD.GUS GRISWOLD.GUS.N
I simply need to find the badname
from FIXES
that appear in HAVE
and replace them with goodname
. I currently loop through name1-name250
in a data step for each row in FIXES
to accomplish this:
data WANT;
set HAVE;
array name {*} name1-name250;
do i=1 to dim(name);
if name{i} = "DETWEILER.TJ.M" then name{i} = "DETWEILER.TJ";
else if name{i} = "GRISWOLD.GUS.N" then name{i} = "GRISWOLD.GUS";
/*manually add other corrections from FIXES dataset*/
else name{i} = name{i};
end;
run;
This feels really inefficient. What is a better way?
Upvotes: 0
Views: 55
Reputation: 51601
When you have a simple exact match translation like that a FORMAT is a simple way to implement it. You can convert your "lookup" data into a format.
data fixes ;
input goodname :$30. badname :$30. ;
cards;
DETWEILER.TJ DETWEILER.TJ.M
GRISWOLD.GUS GRISWOLD.GUS.N
;
data format ;
retain fmtname '$FIXNAME' ;
set fixes end=eof;
rename badname=start goodname=label;
run;
proc format cntlin=format;
run;
Then just use the format to convert the names.
data want;
set have;
array name name1-name5;
do over name;
name=put(name,$fixname30.);
end;
run;
Result:
episode name1 name2 name3 name4 name5
121 DETWEILER.TJ BLUMBERG.MIKEY GRISWOLD.GUS
451 BOB.KING KID.HUSTLER FINSTER.MS PRICKLEY.PETEY GRISWOLD.GUS
Upvotes: 2