Reputation: 1394
I happened to find the following issue which confused me for several hours.
data test;
input RandNo$ Trt$ Tmax;
cards;
K64 R 0.5
K64 T 0.15
K64 R 0.15
K64 T 0.5
K65 T 0.5
K65 R 0.33
K65 T 0.17
K65 R 0.5
;
run;
proc sql noprint;
create table SQL as
select RandNo, TRT, avg(Tmax) as Tmax_Mean
from test
group by RandNo, TRT
;
quit;
ods output Summary = Means;
proc means data = test n mean;
class RandNo TRT;
var Tmax;
run;
ods output;
proc sql;
select a.RandNo, a.TRT, a.Tmax_Mean as SQL,
b.Tmax_Mean as Means,
SQL - Means as Dif
from SQL as a
left join Means as b
on a.RandNo = b.RandNo and a.TRT = b.TRT
;
quit;
Output:
RandNo Trt SQL Mean Dif
K64 R 0.325 0.325 0
K64 T 0.325 0.325 -555E-19
K65 R 0.415 0.415 0
K65 T 0.335 0.335 -555E-19
So why dose the results from proc means and proc sql differs from the other? Thx in advance.
PS: I have tried deleting the observations of 'K64' or 'K65' and the difference just disappear this time.
Upvotes: 0
Views: 456
Reputation: 27498
The statistical engine beneath standard procedures (means, univariate, summary, etc...) is the same, however, SQL statistical engine as you discovered can have very small variation from the Procs engine.
As to why is more a question for the SAS developers. One possibility is the SQL engine may have an extra bit available from the treatment or representation of SQL ISO NULL versus SAS MISSING values (.
through .Z
), which in turn could affect the result.
You can view the underlying bits of the double precision representation using for RB8.
put(SQL,RB8.) format=$hex16. as SQL_RB8,
put(Means,RB8.) format=$hex16. as Means_RB8
RandNo Trt SQL Mean Dif SQL_RB8 Means_RB8.
--------------------------------------------------------------------------------------------
K64 R 0.325 0.325 0 CDCCCCCCCCCCD43F CDCCCCCCCCCCD43F
K64 T 0.325 0.325 -555E-19 CCCCCCCCCCCCD43F CDCCCCCCCCCCD43F
K65 R 0.415 0.415 0 90C2F5285C8FDA3F 90C2F5285C8FDA3F
K65 T 0.335 0.335 -555E-19 703D0AD7A370D53F 713D0AD7A370D53F
When there is miniscule difference you see
CC...
and CD...
for .32570...
and 71...
for .335The difference is in a very lower order bit. Lookup IEEE 754 if you want to learn more about the nitty gritty of storing decimal values in a double precision space.
Upvotes: 1
Reputation: 9569
I'm guessing that one proc applies fuzzing by default but the other doesn't. It's difficult to say exactly why, other than 'legacy reasons'.
Upvotes: 0