Tom
Tom

Reputation: 2341

Attempting to recreate official Stata SEM example in the R-package lavaan

I want to recreate this official Stata example with the R-package lavaan, because I want to know if the same model leads to the same result in both packages.

I am using the same data (I uploaded the data using Stata, ten saved is as a .dta and uploaded that into R) and graphically the model structure looks identical. My issue is that the estimates are not remotely close.

Nevertheless I am not completely sure whether I am doing the same thing in both regressions (perhaps because the assumptions which follow from the two syntaxes are different).

Could someone point out to me if I am doing anything different?

Stata syntax

webuse sem_sm2, clear
sem (anomia67 pwless67 <- Alien67) (anomia71 pwless71 <- Alien71) (Alien67 <- SES) (Alien71 <- Alien67 SES) (    SES -> educ occstat66), nolog 

R-lavaan syntax

library(haven)
library(lavaan)
library(lavaanPlot)


model <- '
Alien67 =~ anomia67 + pwless67 
Alien71  =~ anomia71 + pwless71 
Alien67 ~ SES 
Alien71 ~ Alien67 + SES     
SES =~ educ66 + occstat66
'
model_fit <- sem(model, data=example)
summary(model_fit , standardized=TRUE, fit.measures=TRUE)

lavaanPlot(model = model_fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = FALSE)

Stata output

Endogenous variables

Measurement:  anomia67 pwless67 anomia71 pwless71 educ66 occstat66
Latent:       Alien67 Alien71

Exogenous variables

Latent:       SES

Structural equation model                       Number of obs     =        932
Estimation method  = ml
Log likelihood     = -15246.469

 ( 1)  [anomia67]Alien67 = 1
 ( 2)  [anomia71]Alien71 = 1
 ( 3)  [educ66]SES = 1
---------------------------------------------------------------------------------
                |                 OIM
                |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
Structural      |
  Alien67       |
            SES |  -.6140404   .0562407   -10.92   0.000    -.7242701   -.5038107
  --------------+----------------------------------------------------------------
  Alien71       |
        Alien67 |   .7046342   .0533512    13.21   0.000     .6000678    .8092007
            SES |  -.1744153   .0542489    -3.22   0.001    -.2807413   -.0680894
----------------+----------------------------------------------------------------
Measurement     |
  anomia67      |
        Alien67 |          1  (constrained)
          _cons |      13.61   .1126205   120.85   0.000     13.38927    13.83073
  --------------+----------------------------------------------------------------
  pwless67      |
        Alien67 |   .8884887   .0431565    20.59   0.000     .8039034    .9730739
          _cons |      14.67   .1001798   146.44   0.000     14.47365    14.86635
  --------------+----------------------------------------------------------------
  anomia71      |
        Alien71 |          1  (constrained)
          _cons |      14.13   .1158943   121.92   0.000     13.90285    14.35715
  --------------+----------------------------------------------------------------
  pwless71      |
        Alien71 |   .8486022   .0415205    20.44   0.000     .7672235    .9299808
          _cons |       14.9   .1034537   144.03   0.000     14.69723    15.10277
  --------------+----------------------------------------------------------------
  educ66        |
            SES |          1  (constrained)
          _cons |       10.9   .1014894   107.40   0.000     10.70108    11.09892
  --------------+----------------------------------------------------------------
  occstat66     |
            SES |   5.331259   .4307503    12.38   0.000     4.487004    6.175514
          _cons |      37.49   .6947112    53.96   0.000     36.12839    38.85161
----------------+----------------------------------------------------------------
 var(e.anomia67)|   4.009921   .3582978                      3.365724    4.777416
 var(e.pwless67)|   3.187468    .283374                      2.677762    3.794197
 var(e.anomia71)|   3.695593   .3911512                      3.003245     4.54755
 var(e.pwless71)|   3.621531   .3037908                      3.072483    4.268693
   var(e.educ66)|   2.943819   .5002527                      2.109908    4.107319
var(e.occstat66)|     260.63   18.24572                      227.2139    298.9605
  var(e.Alien67)|   5.301416    .483144                      4.434225    6.338201
  var(e.Alien71)|   3.737286   .3881546                      3.048951    4.581019
        var(SES)|    6.65587   .6409484                      5.511067    8.038482
---------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(6)   =     71.62, Prob > chi2 = 0.0000

R-lavaan

lavaan 0.6-11 ended normally after 149 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        15
                                                      
  Number of observations                            15
                                                      
Model Test User Model:
                                                      
  Test statistic                                26.531
  Degrees of freedom                                 6
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                               370.547
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.942
  Tucker-Lewis Index (TLI)                       0.856

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)                -82.504
  Loglikelihood unrestricted model (H1)        -69.238
                                                      
  Akaike (AIC)                                 195.008
  Bayesian (BIC)                               205.629
  Sample-size adjusted Bayesian (BIC)          159.836

Root Mean Square Error of Approximation:

  RMSEA                                          0.478
  90 Percent confidence interval - lower         0.301
  90 Percent confidence interval - upper         0.670
  P-value RMSEA <= 0.05                          0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.002

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Alien67 =~                                                            
    anomia67          1.000                               3.388    0.999
    pwless67          1.072    0.016   66.835    0.000    3.632    0.999
  Alien71 =~                                                            
    anomia71          1.000                               3.520    0.999
    pwless71          1.049    0.014   74.224    0.000    3.694    0.999
  SES =~                                                                
    educ66            1.000                               2.794    0.993
    occstat66         3.608    0.267   13.528    0.000   10.081    0.969

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Alien67 ~                                                             
    SES               1.176    0.087   13.498    0.000    0.970    0.970
  Alien71 ~                                                             
    Alien67           0.996    0.060   16.505    0.000    0.959    0.959
    SES               0.053    0.073    0.727    0.467    0.042    0.042

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .anomia67          0.018    0.011    1.642    0.101    0.018    0.002
   .pwless67          0.024    0.013    1.810    0.070    0.024    0.002
   .anomia71          0.013    0.009    1.359    0.174    0.013    0.001
   .pwless71          0.023    0.012    1.875    0.061    0.023    0.002
   .educ66            0.104    0.153    0.680    0.497    0.104    0.013
   .occstat66         6.514    3.063    2.127    0.033    6.514    0.060
   .Alien67           0.678    0.327    2.074    0.038    0.059    0.059
   .Alien71           0.012    0.012    0.998    0.318    0.001    0.001
    SES               7.808    2.892    2.699    0.007    1.000    1.000

enter image description here

EDIT:

sem_sm2 <- structure(list(`_group` = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1), format.stata = "%8.0g"), `_type` = structure(c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), format.stata = "%8.0g"), 
    educ66 = structure(c(10.9, 3.1, 1, 0.54, -0.34, -0.41, -0.25, 
    0.53, -0.36, -0.41, -0.16, 0.54, -0.35, -0.37, -0.22), label = "Education, 1966", format.stata = "%10.0g"), 
    occstat66 = structure(c(37.49, 21.22, 0.54, 1, -0.25, -0.3, 
    -0.21, 0.82, -0.3, -0.29, -0.19, 0.81, -0.29, -0.28, -0.16
    ), label = "Occupational status, 1966", format.stata = "%10.0g"), 
    anomia66 = structure(c(13.7, 3.6, -0.34, -0.25, 1, 0.7, 0.27, 
    -0.26, 0.67, 0.55, 0.27, -0.25, 0.53, 0.42, 0.26), label = "Anomia, 1966", format.stata = "%10.0g"), 
    pwless66 = structure(c(14.85, 3.31, -0.41, -0.3, 0.7, 1, 
    0.31, -0.31, 0.55, 0.63, 0.26, -0.25, 0.5, 0.51, 0.26), label = "Powerlessness, 1966", format.stata = "%10.0g"), 
    socdist66 = structure(c(1.11, 1.4, -0.25, -0.21, 0.27, 0.31, 
    1, -0.14, 0.26, 0.28, 0.49, -0.21, 0.27, 0.24, 0.47), label = "Latin American social distance, 1966", format.stata = "%10.0g"), 
    occstat67 = structure(c(36.72, 21, 0.53, 0.82, -0.26, -0.31, 
    -0.14, 1, -0.3, -0.33, -0.11, 0.78, -0.32, -0.26, -0.14), label = "Occupational status, 1967", format.stata = "%10.0g"), 
    anomia67 = structure(c(13.61, 3.44, -0.36, -0.3, 0.67, 0.55, 
    0.26, -0.3, 1, 0.66, 0.25, -0.28, 0.56, 0.44, 0.25), label = "Anomia, 1967", format.stata = "%10.0g"), 
    pwless67 = structure(c(14.67, 3.06, -0.41, -0.29, 0.55, 0.63, 
    0.28, -0.33, 0.66, 1, 0.28, -0.26, 0.47, 0.52, 0.27), label = "Powerlessness, 1967", format.stata = "%10.0g"), 
    socdist67 = structure(c(0.89, 1.27, -0.16, -0.19, 0.27, 0.26, 
    0.49, -0.11, 0.25, 0.28, 1, -0.18, 0.29, 0.23, 0.41), label = "Latin American social distance, 1967", format.stata = "%10.0g"), 
    occstat71 = structure(c(37.47, 20.98, 0.54, 0.81, -0.25, 
    -0.25, -0.21, 0.78, -0.28, -0.26, -0.18, 1, -0.31, -0.28, 
    -0.19), label = "Occupational status, 1971", format.stata = "%10.0g"), 
    anomia71 = structure(c(14.13, 3.54, -0.35, -0.29, 0.53, 0.5, 
    0.27, -0.32, 0.56, 0.47, 0.29, -0.31, 1, 0.67, 0.29), label = "Anomia, 1971", format.stata = "%10.0g"), 
    pwless71 = structure(c(14.9, 3.16, -0.37, -0.28, 0.42, 0.51, 
    0.24, -0.26, 0.44, 0.52, 0.23, -0.28, 0.67, 1, 0.28), label = "Powerlessness, 1971", format.stata = "%10.0g"), 
    socdist71 = structure(c(0.78, 1.25, -0.22, -0.16, 0.26, 0.26, 
    0.47, -0.14, 0.25, 0.27, 0.41, -0.19, 0.29, 0.28, 1), label = "Latin American social distance, 1971", format.stata = "%10.0g")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -15L), label = "Structural model with measurement component", notes = c("Summary statistics data from Wheaton, B., Muthen B., Alwin, D.,           & Summers, G., 1977, \"Assessing reliability and stability in           panel models\", in D. R. Heise (Ed.), _Sociological Methodology           1977_ (pp. 84-136), San Francisco: Jossey-Bass, Inc.", 
"Intended use: Create structural model relating Alienation in 1971, Alienation in 1967, and SES in 1966.", 
"3", "Four indicators each measured in 1966, 1967, and 1981, plus another indicator (educ66) measured only in 1966."
))

Upvotes: 5

Views: 460

Answers (2)

Tom
Tom

Reputation: 2341

Because of all the issues with the Stata sample data, and because I still want to know if the same model leads to the same result in both packages, instead of using lavaan in R to recreate the Stata example, I used Stata to recreate the lavaan example. For this example the outcomes are the same.

In R (puts .dta in working directory)

library(lavaan)
library(haven)

library(lavaan) # only needed once per session
model <- '
  # measurement model
    ind60 =~ x1 + x2 + x3
    dem60 =~ y1 + y2 + y3 + y4
    dem65 =~ y5 + y6 + y7 + y8
  # regressions
    dem60 ~ ind60
    dem65 ~ ind60 + dem60
  # residual correlations
    y1 ~~ y5
    y2 ~~ y4 + y6
    y3 ~~ y7
    y4 ~~ y8
    y6 ~~ y8
'
fit <- sem(model, data=PoliticalDemocracy)
summary(fit, standardized=TRUE)

write_dta(PoliticalDemocracy, getwd())

Stata

# Upload PoliticalDemocracy.dta from working directory

sem (Ind60 -> x1  x2  x3) (Dem60 -> y1  y2  y3  y4) (Dem65 -> y5  y6  y7  y8) (Dem60 <- Ind60) (Dem65 <- Ind60  Dem60), cov(e.y1*e.y5) cov(e.y2*e.y4) cov(e.y2*e.y6) cov(e.y3*e.y7) cov(e.y4*e.y8) cov(e.y6*e.y8)

Stata Output

Endogenous variables

Measurement:  x1 x2 x3 y1 y2 y3 y4 y5 y6 y7 y8
Latent:       Dem60 Dem65

Exogenous variables

Latent:       Ind60

Fitting target model:

Iteration 0:   log likelihood = -1573.8662  
Iteration 1:   log likelihood = -1563.3513  
Iteration 2:   log likelihood = -1551.7615  
Iteration 3:   log likelihood = -1547.8534  
Iteration 4:   log likelihood = -1547.7913  
Iteration 5:   log likelihood = -1547.7909  
Iteration 6:   log likelihood = -1547.7909  

Structural equation model                       Number of obs     =         75
Estimation method  = ml
Log likelihood     = -1547.7909

 ( 1)  [y1]Dem60 = 1
 ( 2)  [y5]Dem65 = 1
 ( 3)  [x1]Ind60 = 1
-------------------------------------------------------------------------------
              |                 OIM
              |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
Structural    |
  Dem60       |
        Ind60 |      1.483   .3972561     3.73   0.000      .704392    2.261607
  ------------+----------------------------------------------------------------
  Dem65       |
        Dem60 |   .8373443   .0987884     8.48   0.000     .6437227    1.030966
        Ind60 |   .5723368   .2337326     2.45   0.014     .1142294    1.030444
--------------+----------------------------------------------------------------
Measurement   |
  x1          |
        Ind60 |          1  (constrained)
        _cons |   5.054384   .0840624    60.13   0.000     4.889625    5.219143
  ------------+----------------------------------------------------------------
  x2          |
        Ind60 |   2.180368    .139012    15.68   0.000     1.907909    2.452826
        _cons |   4.792195   .1732697    27.66   0.000     4.452592    5.131797
  ------------+----------------------------------------------------------------
  x3          |
        Ind60 |   1.818511   .1521907    11.95   0.000     1.520223    2.116799
        _cons |    3.55769   .1612318    22.07   0.000     3.241681    3.873698
  ------------+----------------------------------------------------------------
  y1          |
        Dem60 |          1  (constrained)
        _cons |   5.464667   .3018536    18.10   0.000     4.873044    6.056289
  ------------+----------------------------------------------------------------
  y2          |
        Dem60 |   1.256746      .1855     6.77   0.000     .8931728     1.62032
        _cons |   4.256443   .4498704     9.46   0.000     3.374713    5.138173
  ------------+----------------------------------------------------------------
  y3          |
        Dem60 |   1.057716   .1483352     7.13   0.000     .7669848    1.348448
        _cons |    6.56311   .3758849    17.46   0.000     5.826389    7.299831
  ------------+----------------------------------------------------------------
  y4          |
        Dem60 |   1.264786   .1507281     8.39   0.000     .9693647    1.560208
        _cons |   4.452533   .3839095    11.60   0.000     3.700084    5.204982
  ------------+----------------------------------------------------------------
  y5          |
        Dem65 |          1  (constrained)
        _cons |   5.136252   .3005112    17.09   0.000     4.547261    5.725243
  ------------+----------------------------------------------------------------
  y6          |
        Dem65 |   1.185697   .1713491     6.92   0.000     .8498588    1.521535
        _cons |   2.978074   .3859332     7.72   0.000     2.221659    3.734489
  ------------+----------------------------------------------------------------
  y7          |
        Dem65 |   1.279513   .1603798     7.98   0.000      .965174    1.593851
        _cons |   6.196264   .3771994    16.43   0.000     5.456967    6.935561
  ------------+----------------------------------------------------------------
  y8          |
        Dem65 |   1.265948   .1632167     7.76   0.000     .9460489    1.585847
        _cons |    4.04339   .3713215    10.89   0.000     3.315613    4.771166
--------------+----------------------------------------------------------------
     var(e.x1)|   .0815494   .0197151                      .0507735    .1309798
     var(e.x2)|   .1198064   .0699663                      .0381402    .3763369
     var(e.x3)|   .4667027   .0891824                      .3209105    .6787294
     var(e.y1)|   1.891395   .4687497                      1.163661    3.074241
     var(e.y2)|   7.372868   1.345726                       5.15551     10.5439
     var(e.y3)|   5.067469    .968278                      3.484543    7.369471
     var(e.y4)|   3.147905   .7557987                      1.966308    5.039547
     var(e.y5)|   2.350971   .4887454                      1.564193    3.533494
     var(e.y6)|   4.953964   .8954399                      3.476133    7.060077
     var(e.y7)|   3.431375   .7278048                      2.264257    5.200086
     var(e.y8)|   3.254083   .7070144                      2.125632    4.981603
  var(e.Dem60)|   3.956032    .944614                      2.477488     6.31696
  var(e.Dem65)|   .1724814   .2203083                      .0141096    2.108488
    var(Ind60)|   .4484374   .0867427                      .3069383    .6551677
--------------+----------------------------------------------------------------
cov(e.y1,e.y5)|   .6236715    .368993     1.69   0.091    -.0995414    1.346884
cov(e.y2,e.y4)|   1.313114   .6987846     1.88   0.060     -.056479    2.682706
cov(e.y2,e.y6)|    2.15286   .7262961     2.96   0.003     .7293463    3.576375
cov(e.y3,e.y7)|   .7949629   .6211629     1.28   0.201    -.4224939     2.01242
cov(e.y4,e.y8)|   .3482249   .4578603     0.76   0.447    -.5491648    1.245614
cov(e.y6,e.y8)|   1.356165   .5719805     2.37   0.018     .2351034    2.477226
-------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(35)  =     38.13, Prob > chi2 = 0.3292

Lavaan Output

lavaan 0.6-8 ended normally after 68 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        31
                                                      
  Number of observations                            75
                                                      
Model Test User Model:
                                                      
  Test statistic                                38.125
  Degrees of freedom                                35
  P-value (Chi-square)                           0.329

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  ind60 =~                                                              
    x1                1.000                               0.670    0.920
    x2                2.180    0.139   15.742    0.000    1.460    0.973
    x3                1.819    0.152   11.967    0.000    1.218    0.872
  dem60 =~                                                              
    y1                1.000                               2.223    0.850
    y2                1.257    0.182    6.889    0.000    2.794    0.717
    y3                1.058    0.151    6.987    0.000    2.351    0.722
    y4                1.265    0.145    8.722    0.000    2.812    0.846
  dem65 =~                                                              
    y5                1.000                               2.103    0.808
    y6                1.186    0.169    7.024    0.000    2.493    0.746
    y7                1.280    0.160    8.002    0.000    2.691    0.824
    y8                1.266    0.158    8.007    0.000    2.662    0.828

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  dem60 ~                                                               
    ind60             1.483    0.399    3.715    0.000    0.447    0.447
  dem65 ~                                                               
    ind60             0.572    0.221    2.586    0.010    0.182    0.182
    dem60             0.837    0.098    8.514    0.000    0.885    0.885

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .y1 ~~                                                                 
   .y5                0.624    0.358    1.741    0.082    0.624    0.296
 .y2 ~~                                                                 
   .y4                1.313    0.702    1.871    0.061    1.313    0.273
   .y6                2.153    0.734    2.934    0.003    2.153    0.356
 .y3 ~~                                                                 
   .y7                0.795    0.608    1.308    0.191    0.795    0.191
 .y4 ~~                                                                 
   .y8                0.348    0.442    0.787    0.431    0.348    0.109
 .y6 ~~                                                                 
   .y8                1.356    0.568    2.386    0.017    1.356    0.338

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1                0.082    0.019    4.184    0.000    0.082    0.154
   .x2                0.120    0.070    1.718    0.086    0.120    0.053
   .x3                0.467    0.090    5.177    0.000    0.467    0.239
   .y1                1.891    0.444    4.256    0.000    1.891    0.277
   .y2                7.373    1.374    5.366    0.000    7.373    0.486
   .y3                5.067    0.952    5.324    0.000    5.067    0.478
   .y4                3.148    0.739    4.261    0.000    3.148    0.285
   .y5                2.351    0.480    4.895    0.000    2.351    0.347
   .y6                4.954    0.914    5.419    0.000    4.954    0.443
   .y7                3.431    0.713    4.814    0.000    3.431    0.322
   .y8                3.254    0.695    4.685    0.000    3.254    0.315
    ind60             0.448    0.087    5.173    0.000    1.000    1.000
   .dem60             3.956    0.921    4.295    0.000    0.800    0.800
   .dem65             0.172    0.215    0.803    0.422    0.039    0.039

Upvotes: 2

Andrew Brown
Andrew Brown

Reputation: 1065

Assuming you are using haven to read this .dta file then you are using sample data that are not intended for use in analysis, which is clearly stated on this page: https://www.stata-press.com/data/r17/sem.html

Datasets used in the Stata documentation were selected to demonstrate how to use Stata. Some datasets have been altered to explain a particular feature. Do not use these datasets for analysis.

enter image description here

Unfortunately the sample data only include 15 of the 932 observations shown in this example

x <- haven::read_dta("https://www.stata-press.com/data/r17/sem_sm2.dta")
sum(sapply(x, length))
#> [1] 225
sum(nrow(x))
#> [1] 15

Upvotes: 3

Related Questions