Random sample from another table's column

Question

I am trying to figure out how to populate a "fake" column by choosing randomly from another table column. So far this was easy using an array and the rantbl() function as there were not a lot of modalities.

data want;
set have;

array values[2] $10 _temporary_ ('NO','YES');
value=values[rantbl(0,0.5,0.5)];

array start_dates[4] _temporary_ (1735689600,1780358400,1798848000,1798848000);
START_DATE=start_dates[rantbl(0,0.25,0.25,0.25,0.25)];

format START_DATE datetime20.;
run;

However, my question is what happens if there are, for example, more than 150 modalities in the other table? Hence, is there a way to put into an array all the modalities that are in another table ? Or better, to populate the new "fake" column with modalities from another table's column with regards to the modalities's distribution in the other table ?

PeterClemmensen · Accepted Answer

I'm not entirely sure, but here's how I interpret your request and how I would solve it.

You have a table one. You want to create a new data set want with an additional column. This column should have values that are sampled from a pool of values given in yet another data set two in column y. You want too simulate the new column in the want data set according to the distribution of y in the two data set.

So, in the example below, there should be a .5 change of simulating y = 3 and .25 for 1 and 2 respectively.

I think the way to go is not using arrays at all. See if this helps you.

data one;
   do x = 1 to 1e4;
      output;
   end;
run;

data two;
input y;
datalines;
1
2
3
3
;

data want;
   set one;
   p = ceil(rand('uniform')*n);
   set two(keep = y) nobs = n point = p;
run;

To verify that the new column resembles the distribution from the two data set:

proc freq data = want;
   tables y / nocum;
run;

Random sample from another table's column

Answers (2)

Related Questions

Random sample from another table&#39;s column

Answers (2)

Related Questions

Random sample from another table's column