Reputation: 301
I have a data set with 1000 OBS and I wish to select 10 random OBS. To my understanding, I need to use RANUNI or RAND, but I can't figure out how to implement.
tanks
Upvotes: 1
Views: 842
Reputation: 1770
This macro selects randomly observations from dataset.
Input:
+---------+---+----+
| counter | x | y |
+---------+---+----+
| 1 | 2 | 2 |
| 2 | 3 | 6 |
| 3 | 4 | 12 |
| 4 | 5 | 20 |
| 5 | 6 | 30 |
+---------+---+----+
data have;
do counter=1 to 1000;
x=counter+1;
y=counter*x;
output;
end;
run;
Macro:
%macro select_random_obs(libname,memname,num);%macro d;%mend d;
/*
libname - libname of your dataset
memname - name of dataset
num - num of obs to select randomly
*/
proc sql noprint; /*select num of obs in your dataset (if it is not static value)*/
select nobs into:max from dictionary.tables where libname="%upcase(&libname)" and memname="%upcase(&memname)";
quit;
%let rand_list=; /*macro variable that will contains random nums of obs to select*/
data _null_; /*init rand_list macro variable*/
length tList $32000.;
n=0;
do while (n<&num);
if n=0 then tList="";
repeat:
u = rand("Uniform");
k = ceil( &Max*u );
str=strip(input(k,best12.));
do i=1 to countw(tList,' ');
if scan(tList,i,' ') = k then goto repeat;
end;
tList=catx(' ',tList,str);
n=n+1;
end;
call symputx('rand_list',tList);
run;
%put &=rand_list;
data want; /*create new data set that contain right number of random observations*/
set have;
if _N_ in (&rand_list);
run;
%mend select_random_obs;
%select_random_obs(work,have,10);
Output:
+---------+-----+--------+
| counter | x | y |
+---------+-----+--------+
| 33 | 34 | 1122 |
| 344 | 345 | 118680 |
| 466 | 467 | 217622 |
| 478 | 479 | 228962 |
| 552 | 553 | 305256 |
| 861 | 862 | 742182 |
| 890 | 891 | 792990 |
| 904 | 905 | 818120 |
| 922 | 923 | 851006 |
| 941 | 942 | 886422 |
+---------+-----+--------+
Upvotes: 1
Reputation: 4937
There are many ways to do this, but the simplest is probably this
data have;
do x=1 to 1000;
output;
end;
run;
proc surveyselect data=have out=want seed=123 noprint
method=srs
sampsize=10;
run;
Upvotes: 2