Alex
Alex

Reputation: 15708

How do you get R's null and residual deviance equivalents in Matlab fitglm?

In R, after fitting a glm you can get summary info containing the residual deviance and null deviance which tells you how good your model is compared to the model with just the intercept term, for the example model:

model <- glm(formula = am ~ mpg + qsec, data=mtcars, family=binomial)

we have:

> summary(model)
...
    Null deviance: 43.2297  on 31  degrees of freedom
Residual deviance:  7.5043  on 29  degrees of freedom
AIC: 13.504
...

In Matlab, when you use fitglm you return an object of GeneralizedLinearModel class, which has a Deviance property containing the residual deviance. However, I can't find anything directly related to the null deviance. What is the easiest way to calculate this?

Example Matlab code:

load fisheriris.mat
model = fitglm(meas(:, 1), ismember(species, {'setosa'}), 'Distribution', 'binomial')

produces:

model = 


Generalized Linear regression model:
    logit(y) ~ 1 + x1
    Distribution = Binomial

Estimated Coefficients:
                       Estimate                SE                  tStat                 pValue       
                   _________________    _________________    _________________    ____________________

    (Intercept)     27.8285213954246      4.8275686220899     5.76450042948896    8.19000695766331e-09
    x1             -5.17569812610148    0.893399843474784    -5.79326061438645    6.90328570107794e-09


150 observations, 148 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 119, p-value = 9.87e-28

with a residual deviance of model.Deviance:

>> model.Deviance

ans =

          71.8363992272217

Upvotes: 3

Views: 1792

Answers (2)

Jean-Paul
Jean-Paul

Reputation: 21160

I wrote a GLM class for Matlab which gives exactly the same results:

Generalized Linear Models in Matlab (same results as in R)

For example, a log-link GLM with gamma distribution on sample data gives this in R:

Call:
glm(formula = MilesPerGallon ~ Horsepower + Acceleration + Cylinders, 
    family = Gamma(link = log), data = data)

Deviance Residuals: 
      Min         1Q     Median         3Q        Max  
-0.116817  -0.075084   0.004179   0.060545   0.197108  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   4.955205   0.509903   9.718  < 2e-16 ***
Horsepower   -0.017605   0.004352  -4.046 5.21e-05 ***
Acceleration -0.026137   0.015540  -1.682   0.0926 .  
Cylinders     0.093277   0.054458   1.713   0.0867 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Gamma family taken to be 0.0133)

    Null deviance: 0.388832  on 10  degrees of freedom
Residual deviance: 0.093288  on  7  degrees of freedom
AIC: 64.05

Number of Fisher Scoring iterations: 4

Pearson MSE:  0.008783281 
Deviance MSE:  0.008480725 
McFadden R^2:  0.7600815 

Using the package, this same estimation gives the following result in Matlab:

 :: convergence in 4 iterations
 ------------------------------------------------------------------------------------------
    dependent: MilesPerGallon
  independent: (Intercept),Horsepower,Acceleration,Cylinders
 ------------------------------------------------------------------------------------------
  log(E[MilesPerGallon]) = ß1×(Intercept) + ß2×Horsepower + ß3×Acceleration + ß4×Cylinders
 ------------------------------------------------------------------------------------------
 distribution: GAMMA
         link: LOG
       weight: -
       offset: -
 ============================================================
     Variable    Estimate     S.E.    z-value    Pr(>|z|)
 ============================================================
   (Intercept)      4.955     0.510    9.708     0.00000
    Horsepower     -0.018     0.004   -4.042     0.00005
  Acceleration     -0.026     0.016   -1.680     0.09290
     Cylinders      0.093     0.055    1.711     0.08706
 ============================================================
  Residual deviance:     0.0933     Deviance MSE: 0.0085
  Null deviance:         0.3888     Pearson MSE:  0.0088
  Dispersion:            0.0133     Deviance IC:  0.1026
  McFadden R^2:          0.7601     Residual df:  7.0000
 ============================================================

So approximately the same output. Hope this helps someone out.

Upvotes: 2

Alex
Alex

Reputation: 15708

If the call to fitglm is used with a table and the regression specified using Wilkinson notation, then the resulting GeneralizedLinearModel object model has properties which allow us to retrieve the table used to fit the model, the response name, and the distribution.

Since the null deviance from R is just the deviance of the model with intercept fitted, we can find it by fitting a null_deviance_model using the above information:

null_deviance_model = model.fit(model.Variables, ...
      [model.ResponseName, ' ~ 1'], 'Distribution', model.Distribution.Name);

The null deviance from R is given by null_deviance_model.Deviance.

I am not sure whether this extends to regressions using matrices and vectors for the covariates/response.

Upvotes: 0

Related Questions