Reputation: 5586
I have a dataset with 5 groups and I want to use the DS2 procedure in SAS to concurrently compute group means.
Simulated dataset:
data sim;
call streaminit(7);
do group = 1 to 5;
do pt = 1 to 500;
x = rand('ERLANG', group);
output;
end;
end;
run;
How I envision it working is that each of 5 threads receives a subset of the data corresponding to a particular group. The mean of x
is calculated on each subset like so:
proc ds2;
thread t / overwrite=yes;
dcl double n sum mean;
method init();
n = 0;
sum = 0;
mean = .;
end;
method run();
set sim; /* Or perhaps a subsetted dataset */
sum + x;
n + 1;
end;
method term();
mean = sum / n;
output;
end;
endthread;
...
quit;
The problem is, if you call a thread that processes a dataset like below, rows are sent to the 5 threads all willy-nilly (i.e. irrespective of groups).
data test / overwrite=yes;
dcl thread t t_instance;
method run();
set from t_instance threads=5;
end;
enddata;
How can I tell SAS to subset the data by group
and pass each subset to its own thread?
Upvotes: 4
Views: 1136
Reputation: 63434
I believe you have to add the by
statement inside the run()
method, and then add some code to deal with the by group (ie, if you want it to output for last.group
then add code to do so and clear the totals). DS2 is supposed to be smart and use one thread per by
group (or, at least, process an entire by
group per thread). I'm not sure if you will see a great improvement if you're reading from disk (since the threading advantage is probably less than the disk read time) but who knows.
The only changes below are in run()
, and adding a proc means
to check myself.
data sim;
call streaminit(7);
do group = 1 to 5;
do pt = 1 to 500;
x = rand('ERLANG', group);
output;
end;
end;
run;
proc ds2;
thread t / overwrite=yes;
dcl double n sum mean ;
method init();
n = 0;
sum = 0;
mean = .;
end;
method run();
set sim;
by group;
sum + x;
n + 1;
if last.group then do;
mean = sum / n;
output;
n=0;
sum=0;
end;
end;
method term();
end;
endthread;
run;
data test / overwrite=yes;
dcl thread t t_instance;
method run();
set from t_instance threads=5;
end;
enddata;
run;
quit;
proc means data=sim;
class group;
var x;
run;
Upvotes: 3