Lepidopterist
Lepidopterist

Reputation: 411

PROC SQL with GROUP command extremely slow. Why? Workaround possible?

I have a MACRO which takes a data set D and essentially outputs k disjoint datasets, D_1,...,D_k. The value k is not fixed and depends on properties of the data that are not known in advance. We can assume that k is not larger than 10, though.

The dataset D contains the variables x and y, and I want to overlay the line/scatter plots of x and y for each of D_i over each other. In my particular case x is time, and I want to see the output y for each D_i and compare them to each other.

Hopefully that was clear.

How can I do this? I don't know k in advance, so I need some sort of %do loop. But it doesn't seem that I can put a do loop inside "proc sgplot".

I might be able to make a macro that includes a very long series of commands, but I'm not sure.

How can I overlay these plots in SAS?

EDIT: I am including for reference why I am trying to avoid doing a PROC SGPLOT with the GROUP clause. I tried the following code and it is taking over 30 minutes to compute (I canceled the calculation after this, so I don't know how long it will actually take). PROC SQL runs quite quickly, the program is stuck on PROC SGPLOT.

    proc sql;
    create table dataset as select    
        date, product_code, sum(num_of_records) as total_rec
        from &filename
            group by product_code, data
                order by product_code, date
    ;
quit;

PROC SGPLOT Data = dataset;
    scatter x = date   y = total_rec/group=product_code;
    title "Total records by product code";
run;

The number of observations in the file is 76,000,000.

Upvotes: 1

Views: 490

Answers (1)

Joe
Joe

Reputation: 63424

What you should do is either change your macro to produce one dataset with a variable d_i (or whatever you can logically name it) which identifies which dataset it would've gone to (or identifies it with whatever determines what dataset it would've gone to), or post-macro combine the datasets.

Then, you can use group to overlay your plots. So for example:

data my_data;
  call streaminit(7);
  do d_i = 1 to 5;
    y = 10;    
    x = 0;
    output;
    do x = 1 to 10;
      y + round(rand('Uniform')*3,.1)-1.5;
      output;
    end;
  end;
run;

proc sgplot data=my_data;
  series x=x y=y/group=d_i;
run;

Upvotes: 0

Related Questions