Reputation: 355
I've seen a fair number of different solutions to this using MYSQL but some of the functions that they use don't work within SAS. I have a very large list of customer records that looks like this:
DATE ID .........
1/31/13 1 ............
1/31/13 2 ...........
2/28/13 1 ............
3/31/13 1 ............
3/31/13 2 ............
And I need to select only 1 row per ID group. The thing is I don't want to just choose the first date for each I want to choose a random or arbitrary(whichever is easier to do) row from the available options. Does anybody know a way I could do this?
Upvotes: 3
Views: 2274
Reputation: 4282
Barebones solution:
Proc SurveySelect data=Have out=Want noprint
Method = urs
N = 1
outhits
rep = 1;
Strata Id ;
run;
Depending on your goal, the answer could change slightly. If you're bootstrapping (which it appears you may be), I found this paper quite useful: Don't Be Loopy: Re-Sampling and Simulation the SAS® Way
Upvotes: 2
Reputation: 170
I'm not sure how to do it from one big file but I would write a loop to split up the code into id groups and then do a random sampling.
%macro MACRO();
proc sql noprint;
select ID into :ID separated by " "
from Original_table
;
quit;
proc sql noprint;
select count (*) into :NumIDs
from Original_table
;
quit;
%let NumID = &NumIDs;
%do i = 1 %to &NumID %by 1;
%let loopID = %scan(&ID, &i, " ");
proc sql;
create table ID_&loopID. as
select *
from Original_table
where ID = &loopID.
;
quit;
proc surveyselect data = ID&loopID. method = SRS rep = 1
sampsize = 1 out = RandomID_&loopID.;
id _all_;
run;
proc append base = Id_Random_all
data = RandomID_&loopID.
;
quit;
%end;
%mend;
%MACRO();
The proc append will then add all of them together into one file to look at. I'm sure there is an easier way to do this but this is what I thought of.
Upvotes: 0