Attempting to recreate official Stata SEM example in the R-package lavaan

Question

I want to recreate this official Stata example with the R-package lavaan, because I want to know if the same model leads to the same result in both packages.

I am using the same data (I uploaded the data using Stata, ten saved is as a .dta and uploaded that into R) and graphically the model structure looks identical. My issue is that the estimates are not remotely close.

Nevertheless I am not completely sure whether I am doing the same thing in both regressions (perhaps because the assumptions which follow from the two syntaxes are different).

Could someone point out to me if I am doing anything different?

Stata syntax

webuse sem_sm2, clear
sem (anomia67 pwless67 <- Alien67) (anomia71 pwless71 <- Alien71) (Alien67 <- SES) (Alien71 <- Alien67 SES) (    SES -> educ occstat66), nolog

R-lavaan syntax

library(haven)
library(lavaan)
library(lavaanPlot)


model <- '
Alien67 =~ anomia67 + pwless67 
Alien71  =~ anomia71 + pwless71 
Alien67 ~ SES 
Alien71 ~ Alien67 + SES     
SES =~ educ66 + occstat66
'
model_fit <- sem(model, data=example)
summary(model_fit , standardized=TRUE, fit.measures=TRUE)

lavaanPlot(model = model_fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = FALSE)

Stata output

Endogenous variables

Measurement:  anomia67 pwless67 anomia71 pwless71 educ66 occstat66
Latent:       Alien67 Alien71

Exogenous variables

Latent:       SES

Structural equation model                       Number of obs     =        932
Estimation method  = ml
Log likelihood     = -15246.469

 ( 1)  [anomia67]Alien67 = 1
 ( 2)  [anomia71]Alien71 = 1
 ( 3)  [educ66]SES = 1
---------------------------------------------------------------------------------
                |                 OIM
                |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
Structural      |
  Alien67       |
            SES |  -.6140404   .0562407   -10.92   0.000    -.7242701   -.5038107
  --------------+----------------------------------------------------------------
  Alien71       |
        Alien67 |   .7046342   .0533512    13.21   0.000     .6000678    .8092007
            SES |  -.1744153   .0542489    -3.22   0.001    -.2807413   -.0680894
----------------+----------------------------------------------------------------
Measurement     |
  anomia67      |
        Alien67 |          1  (constrained)
          _cons |      13.61   .1126205   120.85   0.000     13.38927    13.83073
  --------------+----------------------------------------------------------------
  pwless67      |
        Alien67 |   .8884887   .0431565    20.59   0.000     .8039034    .9730739
          _cons |      14.67   .1001798   146.44   0.000     14.47365    14.86635
  --------------+----------------------------------------------------------------
  anomia71      |
        Alien71 |          1  (constrained)
          _cons |      14.13   .1158943   121.92   0.000     13.90285    14.35715
  --------------+----------------------------------------------------------------
  pwless71      |
        Alien71 |   .8486022   .0415205    20.44   0.000     .7672235    .9299808
          _cons |       14.9   .1034537   144.03   0.000     14.69723    15.10277
  --------------+----------------------------------------------------------------
  educ66        |
            SES |          1  (constrained)
          _cons |       10.9   .1014894   107.40   0.000     10.70108    11.09892
  --------------+----------------------------------------------------------------
  occstat66     |
            SES |   5.331259   .4307503    12.38   0.000     4.487004    6.175514
          _cons |      37.49   .6947112    53.96   0.000     36.12839    38.85161
----------------+----------------------------------------------------------------
 var(e.anomia67)|   4.009921   .3582978                      3.365724    4.777416
 var(e.pwless67)|   3.187468    .283374                      2.677762    3.794197
 var(e.anomia71)|   3.695593   .3911512                      3.003245     4.54755
 var(e.pwless71)|   3.621531   .3037908                      3.072483    4.268693
   var(e.educ66)|   2.943819   .5002527                      2.109908    4.107319
var(e.occstat66)|     260.63   18.24572                      227.2139    298.9605
  var(e.Alien67)|   5.301416    .483144                      4.434225    6.338201
  var(e.Alien71)|   3.737286   .3881546                      3.048951    4.581019
        var(SES)|    6.65587   .6409484                      5.511067    8.038482
---------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(6)   =     71.62, Prob > chi2 = 0.0000

R-lavaan

lavaan 0.6-11 ended normally after 149 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        15
                                                      
  Number of observations                            15
                                                      
Model Test User Model:
                                                      
  Test statistic                                26.531
  Degrees of freedom                                 6
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                               370.547
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.942
  Tucker-Lewis Index (TLI)                       0.856

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)                -82.504
  Loglikelihood unrestricted model (H1)        -69.238
                                                      
  Akaike (AIC)                                 195.008
  Bayesian (BIC)                               205.629
  Sample-size adjusted Bayesian (BIC)          159.836

Root Mean Square Error of Approximation:

  RMSEA                                          0.478
  90 Percent confidence interval - lower         0.301
  90 Percent confidence interval - upper         0.670
  P-value RMSEA <= 0.05                          0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.002

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Alien67 =~                                                            
    anomia67          1.000                               3.388    0.999
    pwless67          1.072    0.016   66.835    0.000    3.632    0.999
  Alien71 =~                                                            
    anomia71          1.000                               3.520    0.999
    pwless71          1.049    0.014   74.224    0.000    3.694    0.999
  SES =~                                                                
    educ66            1.000                               2.794    0.993
    occstat66         3.608    0.267   13.528    0.000   10.081    0.969

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Alien67 ~                                                             
    SES               1.176    0.087   13.498    0.000    0.970    0.970
  Alien71 ~                                                             
    Alien67           0.996    0.060   16.505    0.000    0.959    0.959
    SES               0.053    0.073    0.727    0.467    0.042    0.042

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .anomia67          0.018    0.011    1.642    0.101    0.018    0.002
   .pwless67          0.024    0.013    1.810    0.070    0.024    0.002
   .anomia71          0.013    0.009    1.359    0.174    0.013    0.001
   .pwless71          0.023    0.012    1.875    0.061    0.023    0.002
   .educ66            0.104    0.153    0.680    0.497    0.104    0.013
   .occstat66         6.514    3.063    2.127    0.033    6.514    0.060
   .Alien67           0.678    0.327    2.074    0.038    0.059    0.059
   .Alien71           0.012    0.012    0.998    0.318    0.001    0.001
    SES               7.808    2.892    2.699    0.007    1.000    1.000

EDIT:

sem_sm2 <- structure(list(`_group` = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1), format.stata = "%8.0g"), `_type` = structure(c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), format.stata = "%8.0g"), 
    educ66 = structure(c(10.9, 3.1, 1, 0.54, -0.34, -0.41, -0.25, 
    0.53, -0.36, -0.41, -0.16, 0.54, -0.35, -0.37, -0.22), label = "Education, 1966", format.stata = "%10.0g"), 
    occstat66 = structure(c(37.49, 21.22, 0.54, 1, -0.25, -0.3, 
    -0.21, 0.82, -0.3, -0.29, -0.19, 0.81, -0.29, -0.28, -0.16
    ), label = "Occupational status, 1966", format.stata = "%10.0g"), 
    anomia66 = structure(c(13.7, 3.6, -0.34, -0.25, 1, 0.7, 0.27, 
    -0.26, 0.67, 0.55, 0.27, -0.25, 0.53, 0.42, 0.26), label = "Anomia, 1966", format.stata = "%10.0g"), 
    pwless66 = structure(c(14.85, 3.31, -0.41, -0.3, 0.7, 1, 
    0.31, -0.31, 0.55, 0.63, 0.26, -0.25, 0.5, 0.51, 0.26), label = "Powerlessness, 1966", format.stata = "%10.0g"), 
    socdist66 = structure(c(1.11, 1.4, -0.25, -0.21, 0.27, 0.31, 
    1, -0.14, 0.26, 0.28, 0.49, -0.21, 0.27, 0.24, 0.47), label = "Latin American social distance, 1966", format.stata = "%10.0g"), 
    occstat67 = structure(c(36.72, 21, 0.53, 0.82, -0.26, -0.31, 
    -0.14, 1, -0.3, -0.33, -0.11, 0.78, -0.32, -0.26, -0.14), label = "Occupational status, 1967", format.stata = "%10.0g"), 
    anomia67 = structure(c(13.61, 3.44, -0.36, -0.3, 0.67, 0.55, 
    0.26, -0.3, 1, 0.66, 0.25, -0.28, 0.56, 0.44, 0.25), label = "Anomia, 1967", format.stata = "%10.0g"), 
    pwless67 = structure(c(14.67, 3.06, -0.41, -0.29, 0.55, 0.63, 
    0.28, -0.33, 0.66, 1, 0.28, -0.26, 0.47, 0.52, 0.27), label = "Powerlessness, 1967", format.stata = "%10.0g"), 
    socdist67 = structure(c(0.89, 1.27, -0.16, -0.19, 0.27, 0.26, 
    0.49, -0.11, 0.25, 0.28, 1, -0.18, 0.29, 0.23, 0.41), label = "Latin American social distance, 1967", format.stata = "%10.0g"), 
    occstat71 = structure(c(37.47, 20.98, 0.54, 0.81, -0.25, 
    -0.25, -0.21, 0.78, -0.28, -0.26, -0.18, 1, -0.31, -0.28, 
    -0.19), label = "Occupational status, 1971", format.stata = "%10.0g"), 
    anomia71 = structure(c(14.13, 3.54, -0.35, -0.29, 0.53, 0.5, 
    0.27, -0.32, 0.56, 0.47, 0.29, -0.31, 1, 0.67, 0.29), label = "Anomia, 1971", format.stata = "%10.0g"), 
    pwless71 = structure(c(14.9, 3.16, -0.37, -0.28, 0.42, 0.51, 
    0.24, -0.26, 0.44, 0.52, 0.23, -0.28, 0.67, 1, 0.28), label = "Powerlessness, 1971", format.stata = "%10.0g"), 
    socdist71 = structure(c(0.78, 1.25, -0.22, -0.16, 0.26, 0.26, 
    0.47, -0.14, 0.25, 0.27, 0.41, -0.19, 0.29, 0.28, 1), label = "Latin American social distance, 1971", format.stata = "%10.0g")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -15L), label = "Structural model with measurement component", notes = c("Summary statistics data from Wheaton, B., Muthen B., Alwin, D.,           & Summers, G., 1977, \"Assessing reliability and stability in           panel models\", in D. R. Heise (Ed.), _Sociological Methodology           1977_ (pp. 84-136), San Francisco: Jossey-Bass, Inc.", 
"Intended use: Create structural model relating Alienation in 1971, Alienation in 1967, and SES in 1966.", 
"3", "Four indicators each measured in 1966, 1967, and 1981, plus another indicator (educ66) measured only in 1966."
))

Andrew Brown · Accepted Answer

Assuming you are using haven to read this .dta file then you are using sample data that are not intended for use in analysis, which is clearly stated on this page: https://www.stata-press.com/data/r17/sem.html

Datasets used in the Stata documentation were selected to demonstrate how to use Stata. Some datasets have been altered to explain a particular feature. Do not use these datasets for analysis.

Unfortunately the sample data only include 15 of the 932 observations shown in this example

x <- haven::read_dta("https://www.stata-press.com/data/r17/sem_sm2.dta")
sum(sapply(x, length))
#> [1] 225
sum(nrow(x))
#> [1] 15

Attempting to recreate official Stata SEM example in the R-package lavaan

Stata syntax

R-lavaan syntax

Stata output

R-lavaan

EDIT:

Answers (2)

In R (puts .dta in working directory)

Stata

Stata Output

Lavaan Output

Related Questions