Reputation: 1
I have a SAS dataset that looks like this:
id | Date | ...
1 17 Jun
1 19 Jun
2 17 Jun
2 19 Jun
2 21 Jun
3 12 May
each id represents a unique person. I want to keep only 1 row for each unique person, however, still keep the date in dataset. TO achieve this, I need to transform the table into format such as:
id | Date1 | Date2 | Date 3
1 17 Jun 19 Jun
2 17 Jun 19 Jun 21 Jun
3 12 May
If only 1 date has been assigned to that person, then keep the date2 and date3 as missing value.
The full dataset I'm using contains thousands of observations with over 180 different days. However, a unique person will at most be assigned to 5 different days.
Any help appreciated
Upvotes: 0
Views: 194
Reputation: 28441
Using Proc Transpose, then using a Data Step (and borrowing Keith's data).
Both ways need the data sorted by ID.
data have;
input id Date :date9.;
format date date9.;
datalines;
1 17Jun2012
1 19Jun2012
2 17Jun2012
2 19Jun2012
2 21Jun2012
3 12May2012
4 01JAN2013
4 02JAN2013
4 03JAN2013
4 04JAN2013
4 05JAN2013
;
run;
proc sort data=have;
by id;
run;
Proc transpose data=have out=transpose(drop=_name_) prefix=DATE;
by id;
run;
data ds(drop=cnt date);
retain date1 date2 date3 date4 date5;
format date1 date2 date3 date4 date5 mmddyy10.;
set have;
by id;
if first.id then cnt=1;
select(cnt);
when(1) date1=date;
when(2) date2=date;
when(3) date3=date;
when(4) date4=date;
when(5) date5=date;
otherwise;
end;
cnt+1;
if last.id then do;
output;
call missing(of date1-date5);
end;
run;
Upvotes: 0
Reputation: 7602
PROC SUMMARY has functionality to do this, using the IDGROUP statement. The code below will transpose the data and create 5 date columns (specified by out[5]), in date order (specified by min(date)). If you want more information on how this works then check the IDGROUP statement in the PROC MEANS / SUMMARY documentation.
data have;
input id Date :date9.;
format date date9.;
datalines;
1 17Jun2012
1 19Jun2012
2 17Jun2012
2 19Jun2012
2 21Jun2012
3 12May2012
;
run;
proc summary data=have nway;
class id;
output out=want (drop=_:)
idgroup(min(date) out[5] (date)=);
run;
Upvotes: 3