Reputation: 2999
I have a SAS Data set called coaches_assistants
with the following structure. There are always only two records per TeamID
.
TeamID Team_City CoachCode
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
... ... ....
What I'd like to do with this is to create a data set with an extra field called AssistantCode
and make it look like:
TeamID Team_City HeadCode AssistantCode
123 Durham 242 876
124 London 876 922
125 Bath 667 786
126 Dover 544 978
... ... ... ...
If possible, I'd like to do this in a single DATA step (though I recognize that I might need a PROC SORT step first). I know how to do it in python or ruby or any traditional scripting languages, but I don't know how to do it in SAS.
What's the best way to do this?
Upvotes: 0
Views: 3976
Reputation: 9618
Here are two possible solutions (one using a data step as requested and another using PROC SQL):
data have;
length TeamID $3 Team_City CoachCode $20;
input TeamID $ Team_City $ CoachCode $;
datalines;
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
run;
/* A data step solution */
proc sort data=have;
by TeamID;
run;
data want1(keep=TeamID Team_City HeadCode AssistantCode);
/* Define all variables, retain the new ones */
length TeamID $3 Team_City $20 HeadCode $3 AssistantCode $3;
retain HeadCode AssistantCode;
set have;
by TeamID;
if CoachCode =: 'Head'
then HeadCode = substr(CoachCode,6,3);
else AssistantCode = substr(CoachCode,11,3);
if last.TeamID;
run;
/* An SQL solution */
proc sql noprint;
create table want2 as
select TeamID
, max(Team_City) as Team_City
, max(CASE WHEN CoachCode LIKE 'Head%'
THEN substr(CoachCode,6,3) ELSE ' '
END) LENGTH=3 as HeadCode
, max(CASE WHEN CoachCode LIKE 'Assistant%'
THEN substr(CoachCode,11,3) ELSE ' '
END) LENGTH=3 as AssistantCode
from have
group by TeamID;
quit;
PROC SQL has the advantage of not requiring you to sort the data in advance.
Upvotes: 1
Reputation: 63424
While it's possible to do in one datastep, I generally find that this sort of problem is better served in PROC TRANSPOSE. Less manual coding this way and more flexibility for new things (say a new value "HeadAssistant" appeared, this would instantly work).
data have;
length coachcode $25;
input TeamID Team_City $ CoachCode $;
datalines;
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
;;;;
run;
data have_t;
set have;
id=scan(coachcode,1,'_');
val = scan(coachcode,2,'_');
keep teamId team_city id val;
run;
proc transpose data=have_t out=want(drop=_name_);
by teamID team_city;
id id;
var val;
run;
Upvotes: 2
Reputation: 57686
This assumes you've sorted the data by teamID
, and head coaches always come before assistants. Caveat: untested (I really need to get access to SAS again....)
data want (drop=nc coachcode);
set have;
length headcode assistantcode $3;
retain headcode;
by teamid;
nc = length(coachcode);
if substr(coachcode, 1, 4) = 'Head' then
headcode = substr(coachcode, nc-2, nc);
else
assistantcode = substr(coachcode, nc-2, nc);
if last.teamid;
run;
Upvotes: 0