whymath
whymath

Reputation: 1394

Why proc sql and proc means produce different results?

I happened to find the following issue which confused me for several hours.

data test;
    input RandNo$ Trt$ Tmax;
    cards;
    K64 R   0.5
    K64 T   0.15
    K64 R   0.15
    K64 T   0.5
    K65 T   0.5
    K65 R   0.33
    K65 T   0.17
    K65 R   0.5
;
run;

proc sql noprint;
    create table SQL as 
    select RandNo, TRT, avg(Tmax) as Tmax_Mean
    from test
    group by RandNo, TRT
    ;
quit;

ods output Summary = Means;
proc means data = test n mean;
    class RandNo TRT;
    var Tmax;
run;
ods output;

proc sql;
    select a.RandNo, a.TRT, a.Tmax_Mean as SQL,
        b.Tmax_Mean as Means,
        SQL - Means as Dif
    from SQL as a
    left join Means as b
    on a.RandNo = b.RandNo and a.TRT = b.TRT
    ;
quit;

Output:

RandNo Trt SQL Mean Dif 
K64 R 0.325 0.325 0 
K64 T 0.325 0.325 -555E-19 
K65 R 0.415 0.415 0 
K65 T 0.335 0.335 -555E-19 

So why dose the results from proc means and proc sql differs from the other? Thx in advance.

PS: I have tried deleting the observations of 'K64' or 'K65' and the difference just disappear this time.

Upvotes: 0

Views: 456

Answers (2)

Richard
Richard

Reputation: 27498

The statistical engine beneath standard procedures (means, univariate, summary, etc...) is the same, however, SQL statistical engine as you discovered can have very small variation from the Procs engine.

As to why is more a question for the SAS developers. One possibility is the SQL engine may have an extra bit available from the treatment or representation of SQL ISO NULL versus SAS MISSING values (. through .Z), which in turn could affect the result.

You can view the underlying bits of the double precision representation using for RB8.

    put(SQL,RB8.) format=$hex16. as SQL_RB8,
    put(Means,RB8.) format=$hex16. as Means_RB8

 RandNo    Trt            SQL          Mean       Dif      SQL_RB8           Means_RB8.
 --------------------------------------------------------------------------------------------
     K64       R            0.325         0.325         0  CDCCCCCCCCCCD43F  CDCCCCCCCCCCD43F
     K64       T            0.325         0.325  -555E-19  CCCCCCCCCCCCD43F  CDCCCCCCCCCCD43F
     K65       R            0.415         0.415         0  90C2F5285C8FDA3F  90C2F5285C8FDA3F
     K65       T            0.335         0.335  -555E-19  703D0AD7A370D53F  713D0AD7A370D53F

When there is miniscule difference you see

  • CC... and CD... for .325
  • 70... and 71... for .335

The difference is in a very lower order bit. Lookup IEEE 754 if you want to learn more about the nitty gritty of storing decimal values in a double precision space.

Upvotes: 1

user667489
user667489

Reputation: 9569

I'm guessing that one proc applies fuzzing by default but the other doesn't. It's difficult to say exactly why, other than 'legacy reasons'.

Upvotes: 0

Related Questions