Alex A.
Alex A.

Reputation: 5586

Send subset of data to SAS DS2 thread

I have a dataset with 5 groups and I want to use the DS2 procedure in SAS to concurrently compute group means.

Simulated dataset:

data sim;
    call streaminit(7);
    do group = 1 to 5;
        do pt = 1 to 500;
            x = rand('ERLANG', group);
            output;
        end;
    end;
run;

How I envision it working is that each of 5 threads receives a subset of the data corresponding to a particular group. The mean of x is calculated on each subset like so:

proc ds2;
    thread t / overwrite=yes;
        dcl double n sum mean;

        method init();
            n = 0;
            sum = 0;
            mean = .;
        end;

        method run();
            set sim;    /* Or perhaps a subsetted dataset */
            sum + x;
            n + 1;
        end;

        method term();
            mean = sum / n;
            output;
        end;
    endthread;

    ...
quit;

The problem is, if you call a thread that processes a dataset like below, rows are sent to the 5 threads all willy-nilly (i.e. irrespective of groups).

    data test / overwrite=yes;
        dcl thread t t_instance;
        method run();
            set from t_instance threads=5;
        end;
    enddata;

How can I tell SAS to subset the data by group and pass each subset to its own thread?

Upvotes: 4

Views: 1136

Answers (1)

Joe
Joe

Reputation: 63434

I believe you have to add the by statement inside the run() method, and then add some code to deal with the by group (ie, if you want it to output for last.group then add code to do so and clear the totals). DS2 is supposed to be smart and use one thread per by group (or, at least, process an entire by group per thread). I'm not sure if you will see a great improvement if you're reading from disk (since the threading advantage is probably less than the disk read time) but who knows.

The only changes below are in run(), and adding a proc means to check myself.

data sim;
    call streaminit(7);
    do group = 1 to 5;
        do pt = 1 to 500;
            x = rand('ERLANG', group);
            output;
        end;
    end;
run;

proc ds2;
    thread t / overwrite=yes;
        dcl double n sum mean ;

        method init();
            n = 0;
            sum = 0;
            mean = .;
        end;

        method run();
            set sim;
            by group;
            sum + x;
            n + 1;
            if last.group then do;
                mean = sum / n;
                output;
                n=0;
                sum=0;
            end;
        end;

        method term();
        end;
    endthread;
  run;

  data test / overwrite=yes;
        dcl thread t t_instance;
        method run();
            set from t_instance threads=5; 
        end;
    enddata;
    run;    
quit;

proc means data=sim;
class group;
var x;
run;

Upvotes: 3

Related Questions