Reputation: 147
There are other threads out there about randomly selecting rows using other languages, and furthermore my question involves how to define variables from the columns of each randomly selected row.
First I import my data:
proc import OUT = WORK.ROWS
DATAFILE = "C:\rows.xlsx"
DBMS = EXCEL REPLACE; GETNAMES = YES;
run;
proc print;
run;
Which consists of 10 rows, each row containing a pair of two variables (var1, var2) in separate columns. It looks like this:
obs var1 var2
1 0.8828 0.2245
2 0.8833 0.3109
3 0.8699 0.1579
4 0.9035 0.2993
5 0.9641 0.3590
6 0.8846 0.2542
7 0.8752 0.1343
8 0.9309 0.1188
9 0.9018 0.1761
10 0.8832 0.1439
Then, within a DO loop,
DO n = 1 TO 1000; *number of simulations to run;
I would like to randomly draw a single row from the input data file and copy the value of the two variables (var1 and var2) from the row I randomly drew for further use in the same DO loop. In other words, for each iteration of the DO loop, I need to define a new random pair of variables that originated from the same row.
For example, the random values for simulations n = 1 through n = 4 for var1 and var2 could have come from obs (rows) 2, 2, 10, 4:
Simulation #
n = 1
var1 = 0.8846
var2 = 0.2542
n = 2
var1 = 0.8846
var2 = 0.2542
n = 3
var1 = 0.9309
var2 = 0.1188
n = 4
var1 = 0.8832
var2 = 0.1439
Thank you in advance for your help.
Upvotes: 0
Views: 2083
Reputation: 8513
You can do this using the point=
option on the set
statement. First figure out how many observations are in your dataset that you want to pull the data from. Save the number of observations into a macro var called nobs
:
data _null_;
set sashelp.class nobs=i;
if _n_ eq 2 then stop;
call symputx ('nobs',i);
run;
Check we're getting the expected value:
%put &=nobs;
Specify how many times you want to loop:
%let loops = 1000;
Now in a dataset, initiate a loop with the set statement inside. We're going to calculate a random number between 1 and the number of rows in the table. We're then going to issue a set
statement pointing directly to that row. Once we have that, do your thing and output the record. Once we have iterated the desired number of times, force the datastep to terminate:
data want;
do cnt=1 to &loops;
random_obs = floor(rand("Uniform")*100/(100/&nobs))+1; * BETTER CHECK THIS MATH IF YOU NEED TO BE REALLY ACCURATE;
set sashelp.class(keep=age sex) point=random_obs;
* DO YOUR THING;
output;
end;
stop;
run;
EDIT : I forgot to mention that sometimes it's useful to select the same 'random' observations every time you run it. If you want to do this to assist with testing you will need to add this line to the top of your dataset:
call streaminit(123); /* set random number seed */
Upvotes: 0
Reputation: 63424
This is effectively a very tiny bootstrap, with 1000 replicates of sample size one. If that's exactly what you want, then, bingo, you have it here, in PROC SURVEYSELECT:
data have;
input obs var1 var2;
datalines;
1 0.8828 0.2245
2 0.8833 0.3109
3 0.8699 0.1579
4 0.9035 0.2993
5 0.9641 0.3590
6 0.8846 0.2542
7 0.8752 0.1343
8 0.9309 0.1188
9 0.9018 0.1761
10 0.8832 0.1439
;;;;
run;
proc surveyselect data=have out=want seed=7 outhits
method=urs sampsize=1 rep=1000;
run;
Now, if you really want a normal bootstrap analysis (where sample size is bigger than one - usually it's the same as your initial sample's size), you can just jigger with the sampsize=
and rep=
quantities until you get what you expect.
For further reading, Don't be LOOPy by David Cassell is the classic paper on the subject.
Upvotes: 2