Why proc sql and proc means produce different results?

Question

I happened to find the following issue which confused me for several hours.

data test;
    input RandNo$ Trt$ Tmax;
    cards;
    K64 R   0.5
    K64 T   0.15
    K64 R   0.15
    K64 T   0.5
    K65 T   0.5
    K65 R   0.33
    K65 T   0.17
    K65 R   0.5
;
run;

proc sql noprint;
    create table SQL as 
    select RandNo, TRT, avg(Tmax) as Tmax_Mean
    from test
    group by RandNo, TRT
    ;
quit;

ods output Summary = Means;
proc means data = test n mean;
    class RandNo TRT;
    var Tmax;
run;
ods output;

proc sql;
    select a.RandNo, a.TRT, a.Tmax_Mean as SQL,
        b.Tmax_Mean as Means,
        SQL - Means as Dif
    from SQL as a
    left join Means as b
    on a.RandNo = b.RandNo and a.TRT = b.TRT
    ;
quit;

Output:

RandNo Trt SQL Mean Dif 
K64 R 0.325 0.325 0 
K64 T 0.325 0.325 -555E-19 
K65 R 0.415 0.415 0 
K65 T 0.335 0.335 -555E-19

So why dose the results from proc means and proc sql differs from the other? Thx in advance.

PS: I have tried deleting the observations of 'K64' or 'K65' and the difference just disappear this time.

Richard · Accepted Answer

The statistical engine beneath standard procedures (means, univariate, summary, etc...) is the same, however, SQL statistical engine as you discovered can have very small variation from the Procs engine.

As to why is more a question for the SAS developers. One possibility is the SQL engine may have an extra bit available from the treatment or representation of SQL ISO NULL versus SAS MISSING values (. through .Z), which in turn could affect the result.

You can view the underlying bits of the double precision representation using for RB8.

    put(SQL,RB8.) format=$hex16. as SQL_RB8,
    put(Means,RB8.) format=$hex16. as Means_RB8

 RandNo    Trt            SQL          Mean       Dif      SQL_RB8           Means_RB8.
 --------------------------------------------------------------------------------------------
     K64       R            0.325         0.325         0  CDCCCCCCCCCCD43F  CDCCCCCCCCCCD43F
     K64       T            0.325         0.325  -555E-19  CCCCCCCCCCCCD43F  CDCCCCCCCCCCD43F
     K65       R            0.415         0.415         0  90C2F5285C8FDA3F  90C2F5285C8FDA3F
     K65       T            0.335         0.335  -555E-19  703D0AD7A370D53F  713D0AD7A370D53F

When there is miniscule difference you see

CC... and CD... for .325
70... and 71... for .335

The difference is in a very lower order bit. Lookup IEEE 754 if you want to learn more about the nitty gritty of storing decimal values in a double precision space.

Why proc sql and proc means produce different results?

Answers (2)

Related Questions