Reputation: 23
I can find all sorts of information on how to randomly select observations in SAS which is a fairly easy task. This is not what I need though. I need to randomly select variables. What I want to do specifically is randomly choose 20 variables from my list of 159 variables and do this 50 times. I want to ensure diversity too. I have been spending about two days on this and am having no luck.
Upvotes: 2
Views: 1724
Reputation: 12909
I'm glad that you asked this question, because I just developed a solution for that! Let's break down exactly what needs to be done, step-by-step.
Step 0: What do we need to do?
We need a way to take all of our variables and randomly select 20 of them while keeping them within the bounds of the SAS language rules.
We'll require:
Let's start with 1.
Step 1: Getting all the variables
sashelp.vcolumn
provides a list of all variables within a dataset. Let's select them all.
proc sql noprint;
create table all_vars as
select name
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
;
quit;
This gets us a list of all variables within our dataset. Now, we need to sort them randomly.
Step 2: Making them random
SAS provides the rand
function that allows you to pull from any distribution that you'd like. You can use call streaminit(seedhere)
prior to the rand
function to set a specific seed, creating reproducable results.
We'll simply modify our original SQL statement and order the dataset with the rand()
function.
data _null_;
call streaminit(1234);
run;
proc sql noprint;
create table all_vars as
select name
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform');
quit;
Now we've got all of our variables in a random order, distributed evenly by the uniform
distribution.
Step 3: Limit to 20 variables
You can do this a few ways. One way is the obs=
dataset option in separate procedures, another is the outobs=
proc sql
option. Personally, I like the obs=
dataset option since it doesn't generate a warning in the log, and can be used in other procedures.
data _null_;
call streaminit(1234);
run;
proc sql noprint outobs=20;
create table all_vars as
select name
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform');
quit;
Step 4: Loop it 50 times
We'll use SAS Macro Language to do this part. We can create 50 individual datasets this way, or switch the code up slightly and read them into macro variables.
%macro selectVars(loop=50, seed=1234);
data _null_;
call streaminit(&seed);
run;
%do i = 1 %to &loop;
proc sql noprint outobs=20;
create table all_vars&i as
select name
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform')
;
quit;
%end;
%mend;
%selectVars;
Or, option 2:
%macro selectVars(loop=50, seed=1234);
data _null_;
call streaminit(&seed);
run;
%do i = 1 %to &loop;
proc sql noprint outobs=20;
select name
into :varlist separated by ' '
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform')
;
quit;
%end;
%mend;
%selectVars;
The 2nd option will create a local macro variable called &varlist
that will have the random 20 variables separated by spaces. This can be convenient for various modeling procs, and is preferable since it does not create a separate dataset each time.
Hope this helps!
Upvotes: 2
Reputation: 9109
You will need to treat your meta data as data and use SURVEYSELECT to select observations. Then perhaps put these names into macro variables but you did not mention the exact output you want.
data v;
array rvars[159];
run;
proc transpose data=v(obs=0) out=vars name=name;
var rvars:;
run;
proc surveyselect reps=4 sampsize=20 data=vars out=selection;
run;
proc transpose data=selection out=lists(drop=_:);
by replicate;
var name;
run;
proc print;
run;
data _null_;
set lists;
by replicate;
call symputx(cats('VLIST',_n_),catx(' ',of col:));
run;
%put _global_;
Upvotes: 1