Aaraeus
Aaraeus

Reputation: 1155

Why does finding the average in a proc means and a proc sql step yield different results according to a proc compare?

I have the following code. The datasets created are test_means and test_sql. Both are taking the simple average of a variable of your choice.

Both create exactly the same number by eye.

Why is it that the proc compare says the values are unequal to a stupidly small value, like 0E-14? Is this something to do with the way numbers are stored in both procedures?

%let var=; *Insert numeric variable to check;
%let dsn=; *Insert dataset name;

proc means noprint mean data=&dsn.;
    var &var.;
    output out=test_means (keep=_STAT_ &var. where=(_STAT_="MEAN"));
run;

proc sql;
    create table test_sql as select
        "MEAN" as _STAT_ length=8,
        mean(&var.) as &var.
    from &dsn.
    ;
quit;

proc compare data=test_means compare=test_sql;
run;

Upvotes: 0

Views: 323

Answers (1)

Joe
Joe

Reputation: 63424

Differences on the order of E-14 are to be expected from calculations done on floating point values (as SAS numbers are). This isn't particularly related to the different PROCs; all it would take would be summing values in a different order to produce an error along those lines. Even two different runs in PROC SQL could generate differences of this magnitude, if rows end up processed differently (due to multithreading, for example).

PROC COMPARE runs should generally be done with the FUZZ option, unless you are comparing numbers with very small magnitudes. This should usually be part of standard practice, unless you specifically want to see this kind of difference (meaning, unless you want to verify two files are the same file, and not just the same values).

Upvotes: 1

Related Questions