user3012614
user3012614

Reputation: 25

Reproduce Stata code in R, binomial GLM

I'm not a Stata user so I'm trying to reproduce Stata results that are given to me in R. I would like to use a GLM with a complementary log-log function. The stata code I have is:

glm c IndA fia, family(binomial s) link(cloglog) offset(offset)

The R code is:

glmt <- glm(data=dataset, c ~ IndA + fia, offset = offset, 
            family = binomial(link = cloglog))

Which yields different results from the Stata output. I think the difference is in the variable s that is included in the binomial family in Stata (bold in the code). How can I incorporate this variable in the R code?

My dataset looks like:

IndA s c itot fia offset

1 23 0 61 0.442622951 -0.494296322
1 25 0 58 0.431034483 -0.544727175
1 27 0 59 0.389830508 -0.527632742
1 31 3 51 0.37254902 -0.673344553
1 28 2 53 0.41509434 -0.634878272
1 26 0 55 0.436363636 -0.597837001

...

where IndA is a dummy variable that is 0 later in the dataset. c is the difference in s (n - (n+1)).

The R output looks like this:

    Call:
        glm(formula = c ~ IndA + fia, family = binomial(link = cloglog), 
        data = dataset, offset = offset)

    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
     -1.2697  -0.9707  -0.8304   1.3688   1.6390  

     Coefficients:
                  Estimate Std. Error z value Pr(>|z|)
     (Intercept)   1.9633     1.9185   1.023    0.306
     IndA         -0.3174     0.3357  -0.945    0.344
     fia          -5.1155     4.8163  -1.062    0.288

    (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 136.81  on 101  degrees of freedom
    Residual deviance: 134.71  on  99  degrees of freedom
    AIC: 140.71

    Number of Fisher Scoring iterations: 5

The Stata output is a little messed up but looks like this:

    . glm c IndA fia, family(binomial s ) link(cloglog) offset(offset)

    Iteration 0: log likelihood = -144.17967
    Iteration 1: log likelihood = -133.66053
    Iteration 2: log likelihood = -133.58996
    Iteration 3: log likelihood = -133.58992
    Iteration 4: log likelihood = -133.58992

    Generalized linear models No. of obs = 102
    Optimization : ML Residual df = 99
    Scale parameter = 1
    Deviance = 179.1806126 (1/df) Deviance = 1.809905
    Pearson = 203.965157 (1/df) Pearson = 2.060254
    Variance function: V(u) = u*(1-u/s) [Binomial]
    Link function : g(u) = ln(-ln(1-u/s)) [Complementary log-log]
    AIC = 2.678234
    Log likelihood = -133.5899239 BIC = -278.6917
    OIM

    c Coef. Std. Err. z P>|z| [95% Conf. Interval]
    IndA -.7284992 .2308676 -3.16 0.002 -1.180991 -.2760071
    fia -7.147842 3.185532 -2.24 0.025 -13.39137 -.9043128
    _cons .4404201 1.265651 0.35 0.728 -2.040211 2.921051
    offset (offset)

This output is for the whole dataset:

IndA    s   c   itot    fia offset  
1   23  0   61  0.442622951 -0.494296322    
1   25  0   58  0.431034483 -0.544727175    
1   27  0   59  0.389830508 -0.527632742    
1   31  3   51  0.37254902  -0.673344553    
1   28  2   53  0.41509434  -0.634878272    
1   26  0   55  0.436363636 -0.597837001    
1   26  0   52  0.461538462 -0.653926467    
1   27  0   53  0.433962264 -0.634878272    
1   29  1   50  0.42    -0.693147181    
1   28  0   52  0.423076923 -0.653926467    
1   28  0   56  0.392857143 -0.579818495    
1   30  4   50  0.4 -0.693147181    
1   26  0   57  0.421052632 -0.562118918    
1   26  1   56  0.428571429 -0.579818495    
1   25  0   58  0.431034483 -0.544727175    
1   26  0   56  0.428571429 -0.579818495    
1   29  3   54  0.388888889 -0.616186139    
1   26  3   58  0.413793103 -0.544727175    
1   23  0   62  0.435483871 -0.478035801    
1   23  0   62  0.435483871 -0.478035801    
1   25  0   59  0.423728814 -0.527632742    
1   27  3   54  0.425925926 -0.616186139    
1   24  0   60  0.433333333 -0.510825624    
1   25  0   60  0.416666667 -0.510825624    
1   25  0   60  0.416666667 -0.510825624    
1   26  0   57  0.421052632 -0.562118918    
1   27  0   55  0.418181818 -0.597837001    
1   27  0   53  0.433962264 -0.634878272    
1   27  0   55  0.418181818 -0.597837001    
1   29  0   56  0.375   -0.579818495    
1   31  0   53  0.358490566 -0.634878272    
1   31  0   52  0.365384615 -0.653926467    
1   34  0   50  0.32    -0.693147181    
1   34  1   51  0.31372549  -0.673344553    
1       33  5   55  0.309090909 -0.597837001    
1   28  0   60  0.366666667 -0.510825624    
1   28  1   58  0.379310345 -0.544727175    
1   27  0   58  0.396551724 -0.544727175    
1   28  0   58  0.379310345 -0.544727175    
1   28  1   58  0.379310345 -0.544727175    
1   27  0   59  0.389830508 -0.527632742    
1   27  0   59  0.389830508 -0.527632742    
1   27  0   57  0.403508772 -0.562118918    
1   29  1   53  0.396226415 -0.634878272    
1   28  0   55  0.4 -0.597837001    
1   30  1   54  0.37037037  -0.616186139    
1   29  0   54  0.388888889 -0.616186139    
1   31  1   50  0.38    -0.693147181    
1   30  0   57  0.350877193 -0.562118918    
1   30  4   57  0.350877193 -0.562118918    
1   26  0   61  0.393442623 -0.494296322    
0   16  0   61  0.442622951 -0.494296322    
0   17  3   58  0.431034483 -0.544727175    
0   14  0   59  0.389830508 -0.527632742    
0   18  0   51  0.37254902  -0.673344553    
0   19  0   53  0.41509434  -0.634878272    
0   19  0   55  0.436363636 -0.597837001    
0   22  2   52  0.461538462 -0.653926467    
0   20  0   53  0.433962264 -0.634878272    
0   21  1   50  0.42    -0.693147181    
0   20  4   52  0.423076923 -0.653926467    
0   16  0   56  0.392857143 -0.579818495    
0   20  3   50  0.4 -0.693147181    
0   17  0   57  0.421052632 -0.562118918    
0   18  1   56  0.428571429 -0.579818495    
0   17  0   58  0.431034483 -0.544727175    
0   18  1   56  0.428571429 -0.579818495    
0   17  1   54  0.388888889 -0.616186139    
0   16  1   58  0.413793103 -0.544727175    
0   15  0   62  0.435483871 -0.478035801    
0   15  0   62  0.435483871 -0.478035801    
0   16  0   59  0.423728814 -0.527632742    
0   19  3   54  0.425925926 -0.616186139    
0   16  1   60  0.433333333 -0.510825624    
0   15  0   60  0.416666667 -0.510825624    
0   15  0   60  0.416666667 -0.510825624    
0   17  0   57  0.421052632 -0.562118918    
0   18  0   55  0.418181818 -0.597837001    
0   20  2   53  0.433962264 -0.634878272    
0   18  3   55  0.418181818 -0.597837001    
0   15  0   56  0.375   -0.579818495    
0   16  0   53  0.358490566 -0.634878272    
0   17  1   52  0.365384615 -0.653926467    
0   16  1   50  0.32    -0.693147181    
0   15  3   51  0.31372549  -0.673344553    
0   12  0   55  0.309090909 -0.597837001    
0   12  0   60  0.366666667 -0.510825624    
0   14  0   58  0.379310345 -0.544727175    
0   15  1   58  0.396551724 -0.544727175    
0   14  0   58  0.379310345 -0.544727175    
0   14  0   58  0.379310345 -0.544727175    
0   14  0   59  0.389830508 -0.527632742    
0   14  0   59  0.389830508 -0.527632742    
0   16  0   57  0.403508772 -0.562118918    
0   18  1   53  0.396226415 -0.634878272    
0   17  1   55  0.4 -0.597837001    
0   16  0   54  0.37037037  -0.616186139    
0   17  0   54  0.388888889 -0.616186139    
0   19  6   50  0.38    -0.693147181    
0   13  0   57  0.350877193 -0.562118918    
0   13  0   57  0.350877193 -0.562118918    
0   13  1   61  0.393442623 -0.494296322     

Hope this helps.
Thanks in advance!

Upvotes: 0

Views: 711

Answers (1)

zx8754
zx8754

Reputation: 56219

Stata family name

familyname               Description
-------------------------------------------------------------------------
gaussian                 Gaussian (normal)
igaussian                inverse Gaussian
binomial[varnameN|#N]    Bernoulli/binomial
poisson                  Poisson
nbinomial[#k|ml]         negative binomial
gamma                    gamma

Seems like s is one of your variables in dataset. It is hard to tell where this variable should go in R glm without seeing your data structure.

See this link for stata glm examples.

stata glm manual

The binomial distribution can be specified as 1) family(binomial), 2) family(binomial #N), or 3) family(binomial varnameN). In case 2, #N is the value of the binomial denominator N, the number of trials. Specifying family(binomial 1) is the same as specifying family(binomial). In case 3, varnameN is the variable containing the binomial denominator, allowing the number of trials to vary across observations.

Upvotes: 3

Related Questions