Jason Smith
Jason Smith

Reputation: 29

4 way ANOVA on SAS. My code is not displaying the F values and the P values correctly because of error(df), where am I going wrong?

             Alloy 97-1-1-1                   Alloy AuCa        
Dentist Method  1500°F  1600°F  1700°F      1500°F  1600°F  1700°F

1        1      813      792     792          907    792     835
         2      782      698     665          1115   835     870
         3      752      620     835          847    560     585

2        1       715      803     813         858    907    882
         2       772      782    743          933    792    824
         3       835      715    673          698    734    681

3        1       743      627    752          858    762    724
         2       813      743    613          824    847    782
         3       743      681    743          715    824    681

4        1       792      743    762          894    792    649
         2       690      882    772          813    870    858
         3       493      707    289          715    813    312

5        1       707      698    715          772    1048   870
         2       803      665    752          824     933   835
         3       421      483    405          536     405   312

here is my sas code for the above data:

data gold;

do dentist=1, 2, 3, 4, 5;
    do method=1, 2, 3;
        do alloy= 1,2;
           do temp=1500, 1600, 1700; 
              input y @@; output; 
           end;
          end;
        end;
    end;

cards;
813 792 792     907 792 835
782 698 665     1115 835 870
752 620 835     847 560 585

715 803 813     858 907 882
772 782 743     933 792 824
835 715 673     698 734 681

743 627 752     858 762 724
813 743 613     824 847 782
743 681 743     715 824 681

792 743 762     894 792 649
690 882 772     813 870 858
493 707 289     715 813 312

707 698 715     772 1048 870
803 665 752     824 933 835
421 483 405     536 405 312
;
ODS graphics on;
proc GLM data=gold;
class dentist method alloy temp;
model y=dentist|method|alloy|temp;
run; quit;

Where did I go wrong?

here is a part of the output:

The GLM Procedure
Dependent Variable: y 

    Source DF Sum of Squares Mean Square F Value Pr > F 
    Model  89 1891095.556     21248.265   .       . 
    Error  0  0.000             .     
    Total 89  1891095.556       

    R-Square Coeff Var Root MSE y Mean 
    1.000000  .        .        741.7778 

the error is supposed to be Residuals 75772.0 16 4735.7

the residuals/error is not suppose to be 0, because of that whole code is wrong. :(

I also need to know how I could create an interaction plot/graph for the above code. any help with my code would be highly appreciated.

Upvotes: 0

Views: 787

Answers (1)

Dirk Horsten
Dirk Horsten

Reputation: 3845

This is a classical example of over fitting.

You have only 90 measurements, resulting in a model with 89 degrees of freedom (DF). To fit those, you are using

  • 1 intercept
  • plus 5 factors for the different dentists, with one constraint: they must sum up to 0, i.e. 4 DF
  • plus 3 factors for the methodm, minus one constraint again, i.e. 2 DF
  • plus 15 factors for the combinations of dentist and method, which must meet the below 8 constraints. As these constraints are not completely independent, this reduces the DF with only 7, i.e. you allow GLM to 8 DF
    • for every dentist the factors for all methods must sum up to 0 and
    • for every method the factors for all dentists must sum up to 0

and so forth.

In short, you allow the GLM procedure to choose 1 intercept plus 89 other DF to fit only 90 values. GLM can produce a model that fits your data exactly. No wonder the model is without error!

To understand it better:

Introduce fake measurements which slightly differ from the real ones, for instance this way

data gold;
do dentist=1, 2, 3, 4, 5;
    do method=1, 2, 3;
        do alloy= 1,2;
            do temp=1500, 1600, 1700; 
                input y @@; 
                output; 
                Y +.1 * rand('NORMAL', 0, 500);
                output; 
            end;
        end;
    end;
end;
cards;

Now your output might look like

Source          DF      Sum of Squar    Mean Square     F Value    Pr > F      
Model           89      19556981.91     219741.37       1.45       0.0403      
Error           90      13643754.57     151597.27                                   
Corrected To    179     33200736.48                                                 
                                                                                    
R-Square        Coeff   Root MSE        y Mean                                      
0.589053        51.89   389.3549        750.2041                                    

(not exactly, as I introduced some randomness) Indeed, you still give GLM one intercept and 89 factors (DF) to choose, but you ask it to fit 180 values (1 intercept and 179 DF)

What you should do

_(unless you ask the dentists to do 90 extra measurements) is to choose a simpler model. I suppose you are not interested in evaluating dentists, but only techniques, i.e. methods, alloys and temperatures, so write

proc GLM data=gold;
    class dentist method alloy temp;
    model y=method|alloy|temp; ** <- nothing about dentists here **;
run; quit;

and the result will be:

Dependent Variable: y                   
Source              DF      Sum of Squar    Mean Square     F Value     Pr > F
Model               17      905055.156      53238.539       3.89        <.0001
Error               72      986040.4        13695.006              
Corrected Total     89      1891095.556                            
                                                                                            
R-Square            Coeff Var   Root MSE        y Mean 
0.478588            15.77638    117.0257        741.7778 

This tells you the simpler model so much more about your numbers (Mean Square 53238.539) than the 'error' it does not explain _(Mean Square 13695.006) that it is extremely improbable (less than 0.01% probable) that this is by chance.

The last part of your output

Source              DF      Type III SS     Mean Square     F Value         Pr > F      
method              2       593427.4889     296713.7444     21.67           <.0001      
alloy               1       105815.5111     105815.5111     7.73            0.0069      
method*alloy        2       54685.0889      27342.5444      2               0.1433      
temp                2       82178.0222      41089.0111      3               0.056       
method*temp         4       30652.4444      7663.1111       0.56            0.6927      
alloy*temp          2       21725.3556      10862.6778      0.79            0.4563      
method*alloy*temp   4       16571.2444      4142.8111       0.3             0.8754      

tells you that

  • it is extremely probable the method makes a difference (less than 0.01% probable the high Mean Square value is by chance)
  • we have statistically significant indications alloy makes a difference (0.69% probable the high Mean Square value is by chance)
  • there is some indication the temperature makes a difference (5.6% probable the high Mean Square value is by chance), you better collect some more data before you publish this
  • there might be an interaction between method and alloy, but it would require much more data to study it

That is what I would conclude from your experiment.

Upvotes: 1

Related Questions