Reputation: 175
Say I have a dataset that consists of schoolID SAT_code
and student_name
. What I want to classify is that for those schoolID
with 'ABC'
would have same SAT_code
of 'East'
. While those with schoolID
with 'XYZ'
would have same SAT_code
of 'Midwest'
.
For a dataset like this:
schoolID SAT_code student_name
ABC Jasmine Smith
ABC Michael Jordan
ABC Madison Trump
XYZ Sarah Potter
XYZ Jim Fowler
XYZ Jack Black
. .
. .
. .
There are more than 30 schoolID besides that.
The easiest, but notorious one I could think of was to use if-then for 30 times.
data stateSAT;
set statestats;
if schoolID eq 'ABC' then SAT_code 'East';
else if schoolID eq 'XYZ' then SAT_code 'Midwest';
else if schoolID eq 'MNO' then SAT_code 'East';
and so forth.....
run;
Are there more efficient way, possibly having some kind of for-loop to resolve this issue?
Thanks.
Upvotes: 1
Views: 742
Reputation: 27518
Your long sequence of if/then
statements can be more cleanly stated with a SELECT
statement. It is similar to SQL CASE
or C switch
length SAT_code $20;
select (schoolID);
when ("ABC") SAT_code = 'East';
when ("XYZ") SAT_code = 'Midwest';
when ("MNO") SAT_code = 'East';
…
otherwise SAT_code = '???';
end;
However, with 30+ schoolIDs, you would be better off following @mjsqu advice of moving those schoolID mappings to another data structure.
Upvotes: 1
Reputation: 5452
This is best done using a lookup table:
Create a table with schoolID
and SAT_code
and then perform a merge with your source table:
data schoolsat;
infile datalines delimiter=',';
input schoolID $3 SAT_code $25;
datalines;
ABC,East
XYZ,Midwest
MNO,East
;
run;
This creates a table which maps the schoolID
values to SAT_code
values. Add all of the required combinations to this table.
Once you've done that, there are two ways of merging data in the SAS world. These are both examples of 'LEFT JOINs', which will keep all records from your statestats
table, regardless of whether there's a corresponding row in your new schoolsat
mapping table created above. If there's no corresponding schoolID
value in schoolsat
, SAS will leave a missing value for SAT_code
in the resulting table.
proc sql;
create table stateSAT as
select a.*,
b.SAT_code
from statestats a
left join schoolsat b
on a.schoolID = b.schoolID;
quit;
proc sort data=schoolsat;
by schoolID;
run;
proc sort data=statestats;
by schoolID;
run;
data stateSAT;
merge statestats (in=a)
schoolsat;
by schoolID;
if a;
run;
Upvotes: 2