Michael Fritz
Michael Fritz

Reputation: 23

How to randomly select variables in SAS?

I can find all sorts of information on how to randomly select observations in SAS which is a fairly easy task. This is not what I need though. I need to randomly select variables. What I want to do specifically is randomly choose 20 variables from my list of 159 variables and do this 50 times. I want to ensure diversity too. I have been spending about two days on this and am having no luck.

Upvotes: 2

Views: 1724

Answers (2)

Stu Sztukowski
Stu Sztukowski

Reputation: 12909

I'm glad that you asked this question, because I just developed a solution for that! Let's break down exactly what needs to be done, step-by-step.

Step 0: What do we need to do?

We need a way to take all of our variables and randomly select 20 of them while keeping them within the bounds of the SAS language rules.

We'll require:

  1. All variables in the dataset
  2. A way to re-sort them randomly
  3. A limit of 20 variables
  4. A way to loop this 50 times

Let's start with 1.

Step 1: Getting all the variables

sashelp.vcolumn provides a list of all variables within a dataset. Let's select them all.

proc sql noprint;
    create table all_vars as
        select name
        where libname = 'LIBRARYHERE' AND memname = 'HAVE'
    ;
quit;

This gets us a list of all variables within our dataset. Now, we need to sort them randomly.

Step 2: Making them random

SAS provides the rand function that allows you to pull from any distribution that you'd like. You can use call streaminit(seedhere) prior to the rand function to set a specific seed, creating reproducable results.

We'll simply modify our original SQL statement and order the dataset with the rand() function.

data _null_;
    call streaminit(1234);
run;

proc sql noprint;
    create table all_vars as
        select name
        from sashelp.vcolumn
        where libname = 'LIBRARYHERE' AND memname = 'HAVE'
        order by rand('uniform');
quit;

Now we've got all of our variables in a random order, distributed evenly by the uniform distribution.

Step 3: Limit to 20 variables

You can do this a few ways. One way is the obs= dataset option in separate procedures, another is the outobs= proc sql option. Personally, I like the obs= dataset option since it doesn't generate a warning in the log, and can be used in other procedures.

data _null_;
    call streaminit(1234);
run;

proc sql noprint outobs=20;
    create table all_vars as
        select name
        from sashelp.vcolumn
        where libname = 'LIBRARYHERE' AND memname = 'HAVE'
        order by rand('uniform');
quit;

Step 4: Loop it 50 times

We'll use SAS Macro Language to do this part. We can create 50 individual datasets this way, or switch the code up slightly and read them into macro variables.

%macro selectVars(loop=50, seed=1234);
    data _null_;
        call streaminit(&seed);
    run;

    %do i = 1 %to &loop;
        proc sql noprint outobs=20;
            create table all_vars&i as
                select name
                from sashelp.vcolumn
                where libname = 'LIBRARYHERE' AND memname = 'HAVE'
                order by rand('uniform')
        ;
        quit;
   %end;

%mend;
%selectVars;

Or, option 2:

%macro selectVars(loop=50, seed=1234);

    data _null_;
       call streaminit(&seed);
    run;

    %do i = 1 %to &loop;
        proc sql noprint outobs=20;
          select name
          into :varlist separated by ' '
          from sashelp.vcolumn
          where libname = 'LIBRARYHERE' AND memname = 'HAVE'
          order by rand('uniform')
        ;
        quit;
    %end;

%mend;
%selectVars;

The 2nd option will create a local macro variable called &varlist that will have the random 20 variables separated by spaces. This can be convenient for various modeling procs, and is preferable since it does not create a separate dataset each time.

Hope this helps!

Upvotes: 2

data _null_
data _null_

Reputation: 9109

You will need to treat your meta data as data and use SURVEYSELECT to select observations. Then perhaps put these names into macro variables but you did not mention the exact output you want.

data v;
   array rvars[159];
   run;
proc transpose data=v(obs=0) out=vars name=name;
   var rvars:;
   run;
proc surveyselect reps=4 sampsize=20 data=vars out=selection;
   run;
proc transpose data=selection out=lists(drop=_:);
   by replicate;
   var name;
   run;
proc print;
   run;
data _null_;
   set lists;
   by replicate;
   call symputx(cats('VLIST',_n_),catx(' ',of col:));
   run;
%put _global_;

Upvotes: 1

Related Questions