Reputation: 2341
I want to recreate this official Stata example with the R-package lavaan
, because I want to know if the same model leads to the same result in both packages.
I am using the same data (I uploaded the data using Stata, ten saved is as a .dta
and uploaded that into R
) and graphically the model structure looks identical. My issue is that the estimates are not remotely close.
Nevertheless I am not completely sure whether I am doing the same thing in both regressions (perhaps because the assumptions which follow from the two syntaxes are different).
Could someone point out to me if I am doing anything different?
webuse sem_sm2, clear
sem (anomia67 pwless67 <- Alien67) (anomia71 pwless71 <- Alien71) (Alien67 <- SES) (Alien71 <- Alien67 SES) ( SES -> educ occstat66), nolog
library(haven)
library(lavaan)
library(lavaanPlot)
model <- '
Alien67 =~ anomia67 + pwless67
Alien71 =~ anomia71 + pwless71
Alien67 ~ SES
Alien71 ~ Alien67 + SES
SES =~ educ66 + occstat66
'
model_fit <- sem(model, data=example)
summary(model_fit , standardized=TRUE, fit.measures=TRUE)
lavaanPlot(model = model_fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = FALSE)
Endogenous variables
Measurement: anomia67 pwless67 anomia71 pwless71 educ66 occstat66
Latent: Alien67 Alien71
Exogenous variables
Latent: SES
Structural equation model Number of obs = 932
Estimation method = ml
Log likelihood = -15246.469
( 1) [anomia67]Alien67 = 1
( 2) [anomia71]Alien71 = 1
( 3) [educ66]SES = 1
---------------------------------------------------------------------------------
| OIM
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
Structural |
Alien67 |
SES | -.6140404 .0562407 -10.92 0.000 -.7242701 -.5038107
--------------+----------------------------------------------------------------
Alien71 |
Alien67 | .7046342 .0533512 13.21 0.000 .6000678 .8092007
SES | -.1744153 .0542489 -3.22 0.001 -.2807413 -.0680894
----------------+----------------------------------------------------------------
Measurement |
anomia67 |
Alien67 | 1 (constrained)
_cons | 13.61 .1126205 120.85 0.000 13.38927 13.83073
--------------+----------------------------------------------------------------
pwless67 |
Alien67 | .8884887 .0431565 20.59 0.000 .8039034 .9730739
_cons | 14.67 .1001798 146.44 0.000 14.47365 14.86635
--------------+----------------------------------------------------------------
anomia71 |
Alien71 | 1 (constrained)
_cons | 14.13 .1158943 121.92 0.000 13.90285 14.35715
--------------+----------------------------------------------------------------
pwless71 |
Alien71 | .8486022 .0415205 20.44 0.000 .7672235 .9299808
_cons | 14.9 .1034537 144.03 0.000 14.69723 15.10277
--------------+----------------------------------------------------------------
educ66 |
SES | 1 (constrained)
_cons | 10.9 .1014894 107.40 0.000 10.70108 11.09892
--------------+----------------------------------------------------------------
occstat66 |
SES | 5.331259 .4307503 12.38 0.000 4.487004 6.175514
_cons | 37.49 .6947112 53.96 0.000 36.12839 38.85161
----------------+----------------------------------------------------------------
var(e.anomia67)| 4.009921 .3582978 3.365724 4.777416
var(e.pwless67)| 3.187468 .283374 2.677762 3.794197
var(e.anomia71)| 3.695593 .3911512 3.003245 4.54755
var(e.pwless71)| 3.621531 .3037908 3.072483 4.268693
var(e.educ66)| 2.943819 .5002527 2.109908 4.107319
var(e.occstat66)| 260.63 18.24572 227.2139 298.9605
var(e.Alien67)| 5.301416 .483144 4.434225 6.338201
var(e.Alien71)| 3.737286 .3881546 3.048951 4.581019
var(SES)| 6.65587 .6409484 5.511067 8.038482
---------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(6) = 71.62, Prob > chi2 = 0.0000
lavaan 0.6-11 ended normally after 149 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 15
Number of observations 15
Model Test User Model:
Test statistic 26.531
Degrees of freedom 6
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 370.547
Degrees of freedom 15
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.942
Tucker-Lewis Index (TLI) 0.856
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -82.504
Loglikelihood unrestricted model (H1) -69.238
Akaike (AIC) 195.008
Bayesian (BIC) 205.629
Sample-size adjusted Bayesian (BIC) 159.836
Root Mean Square Error of Approximation:
RMSEA 0.478
90 Percent confidence interval - lower 0.301
90 Percent confidence interval - upper 0.670
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.002
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Alien67 =~
anomia67 1.000 3.388 0.999
pwless67 1.072 0.016 66.835 0.000 3.632 0.999
Alien71 =~
anomia71 1.000 3.520 0.999
pwless71 1.049 0.014 74.224 0.000 3.694 0.999
SES =~
educ66 1.000 2.794 0.993
occstat66 3.608 0.267 13.528 0.000 10.081 0.969
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Alien67 ~
SES 1.176 0.087 13.498 0.000 0.970 0.970
Alien71 ~
Alien67 0.996 0.060 16.505 0.000 0.959 0.959
SES 0.053 0.073 0.727 0.467 0.042 0.042
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.anomia67 0.018 0.011 1.642 0.101 0.018 0.002
.pwless67 0.024 0.013 1.810 0.070 0.024 0.002
.anomia71 0.013 0.009 1.359 0.174 0.013 0.001
.pwless71 0.023 0.012 1.875 0.061 0.023 0.002
.educ66 0.104 0.153 0.680 0.497 0.104 0.013
.occstat66 6.514 3.063 2.127 0.033 6.514 0.060
.Alien67 0.678 0.327 2.074 0.038 0.059 0.059
.Alien71 0.012 0.012 0.998 0.318 0.001 0.001
SES 7.808 2.892 2.699 0.007 1.000 1.000
sem_sm2 <- structure(list(`_group` = structure(c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), format.stata = "%8.0g"), `_type` = structure(c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), format.stata = "%8.0g"),
educ66 = structure(c(10.9, 3.1, 1, 0.54, -0.34, -0.41, -0.25,
0.53, -0.36, -0.41, -0.16, 0.54, -0.35, -0.37, -0.22), label = "Education, 1966", format.stata = "%10.0g"),
occstat66 = structure(c(37.49, 21.22, 0.54, 1, -0.25, -0.3,
-0.21, 0.82, -0.3, -0.29, -0.19, 0.81, -0.29, -0.28, -0.16
), label = "Occupational status, 1966", format.stata = "%10.0g"),
anomia66 = structure(c(13.7, 3.6, -0.34, -0.25, 1, 0.7, 0.27,
-0.26, 0.67, 0.55, 0.27, -0.25, 0.53, 0.42, 0.26), label = "Anomia, 1966", format.stata = "%10.0g"),
pwless66 = structure(c(14.85, 3.31, -0.41, -0.3, 0.7, 1,
0.31, -0.31, 0.55, 0.63, 0.26, -0.25, 0.5, 0.51, 0.26), label = "Powerlessness, 1966", format.stata = "%10.0g"),
socdist66 = structure(c(1.11, 1.4, -0.25, -0.21, 0.27, 0.31,
1, -0.14, 0.26, 0.28, 0.49, -0.21, 0.27, 0.24, 0.47), label = "Latin American social distance, 1966", format.stata = "%10.0g"),
occstat67 = structure(c(36.72, 21, 0.53, 0.82, -0.26, -0.31,
-0.14, 1, -0.3, -0.33, -0.11, 0.78, -0.32, -0.26, -0.14), label = "Occupational status, 1967", format.stata = "%10.0g"),
anomia67 = structure(c(13.61, 3.44, -0.36, -0.3, 0.67, 0.55,
0.26, -0.3, 1, 0.66, 0.25, -0.28, 0.56, 0.44, 0.25), label = "Anomia, 1967", format.stata = "%10.0g"),
pwless67 = structure(c(14.67, 3.06, -0.41, -0.29, 0.55, 0.63,
0.28, -0.33, 0.66, 1, 0.28, -0.26, 0.47, 0.52, 0.27), label = "Powerlessness, 1967", format.stata = "%10.0g"),
socdist67 = structure(c(0.89, 1.27, -0.16, -0.19, 0.27, 0.26,
0.49, -0.11, 0.25, 0.28, 1, -0.18, 0.29, 0.23, 0.41), label = "Latin American social distance, 1967", format.stata = "%10.0g"),
occstat71 = structure(c(37.47, 20.98, 0.54, 0.81, -0.25,
-0.25, -0.21, 0.78, -0.28, -0.26, -0.18, 1, -0.31, -0.28,
-0.19), label = "Occupational status, 1971", format.stata = "%10.0g"),
anomia71 = structure(c(14.13, 3.54, -0.35, -0.29, 0.53, 0.5,
0.27, -0.32, 0.56, 0.47, 0.29, -0.31, 1, 0.67, 0.29), label = "Anomia, 1971", format.stata = "%10.0g"),
pwless71 = structure(c(14.9, 3.16, -0.37, -0.28, 0.42, 0.51,
0.24, -0.26, 0.44, 0.52, 0.23, -0.28, 0.67, 1, 0.28), label = "Powerlessness, 1971", format.stata = "%10.0g"),
socdist71 = structure(c(0.78, 1.25, -0.22, -0.16, 0.26, 0.26,
0.47, -0.14, 0.25, 0.27, 0.41, -0.19, 0.29, 0.28, 1), label = "Latin American social distance, 1971", format.stata = "%10.0g")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -15L), label = "Structural model with measurement component", notes = c("Summary statistics data from Wheaton, B., Muthen B., Alwin, D., & Summers, G., 1977, \"Assessing reliability and stability in panel models\", in D. R. Heise (Ed.), _Sociological Methodology 1977_ (pp. 84-136), San Francisco: Jossey-Bass, Inc.",
"Intended use: Create structural model relating Alienation in 1971, Alienation in 1967, and SES in 1966.",
"3", "Four indicators each measured in 1966, 1967, and 1981, plus another indicator (educ66) measured only in 1966."
))
Upvotes: 5
Views: 460
Reputation: 2341
Because of all the issues with the Stata sample data, and because I still want to know if the same model leads to the same result in both packages, instead of using lavaan
in R
to recreate the Stata example, I used Stata to recreate the lavaan
example. For this example the outcomes are the same.
library(lavaan)
library(haven)
library(lavaan) # only needed once per session
model <- '
# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
fit <- sem(model, data=PoliticalDemocracy)
summary(fit, standardized=TRUE)
write_dta(PoliticalDemocracy, getwd())
# Upload PoliticalDemocracy.dta from working directory
sem (Ind60 -> x1 x2 x3) (Dem60 -> y1 y2 y3 y4) (Dem65 -> y5 y6 y7 y8) (Dem60 <- Ind60) (Dem65 <- Ind60 Dem60), cov(e.y1*e.y5) cov(e.y2*e.y4) cov(e.y2*e.y6) cov(e.y3*e.y7) cov(e.y4*e.y8) cov(e.y6*e.y8)
Endogenous variables
Measurement: x1 x2 x3 y1 y2 y3 y4 y5 y6 y7 y8
Latent: Dem60 Dem65
Exogenous variables
Latent: Ind60
Fitting target model:
Iteration 0: log likelihood = -1573.8662
Iteration 1: log likelihood = -1563.3513
Iteration 2: log likelihood = -1551.7615
Iteration 3: log likelihood = -1547.8534
Iteration 4: log likelihood = -1547.7913
Iteration 5: log likelihood = -1547.7909
Iteration 6: log likelihood = -1547.7909
Structural equation model Number of obs = 75
Estimation method = ml
Log likelihood = -1547.7909
( 1) [y1]Dem60 = 1
( 2) [y5]Dem65 = 1
( 3) [x1]Ind60 = 1
-------------------------------------------------------------------------------
| OIM
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
Structural |
Dem60 |
Ind60 | 1.483 .3972561 3.73 0.000 .704392 2.261607
------------+----------------------------------------------------------------
Dem65 |
Dem60 | .8373443 .0987884 8.48 0.000 .6437227 1.030966
Ind60 | .5723368 .2337326 2.45 0.014 .1142294 1.030444
--------------+----------------------------------------------------------------
Measurement |
x1 |
Ind60 | 1 (constrained)
_cons | 5.054384 .0840624 60.13 0.000 4.889625 5.219143
------------+----------------------------------------------------------------
x2 |
Ind60 | 2.180368 .139012 15.68 0.000 1.907909 2.452826
_cons | 4.792195 .1732697 27.66 0.000 4.452592 5.131797
------------+----------------------------------------------------------------
x3 |
Ind60 | 1.818511 .1521907 11.95 0.000 1.520223 2.116799
_cons | 3.55769 .1612318 22.07 0.000 3.241681 3.873698
------------+----------------------------------------------------------------
y1 |
Dem60 | 1 (constrained)
_cons | 5.464667 .3018536 18.10 0.000 4.873044 6.056289
------------+----------------------------------------------------------------
y2 |
Dem60 | 1.256746 .1855 6.77 0.000 .8931728 1.62032
_cons | 4.256443 .4498704 9.46 0.000 3.374713 5.138173
------------+----------------------------------------------------------------
y3 |
Dem60 | 1.057716 .1483352 7.13 0.000 .7669848 1.348448
_cons | 6.56311 .3758849 17.46 0.000 5.826389 7.299831
------------+----------------------------------------------------------------
y4 |
Dem60 | 1.264786 .1507281 8.39 0.000 .9693647 1.560208
_cons | 4.452533 .3839095 11.60 0.000 3.700084 5.204982
------------+----------------------------------------------------------------
y5 |
Dem65 | 1 (constrained)
_cons | 5.136252 .3005112 17.09 0.000 4.547261 5.725243
------------+----------------------------------------------------------------
y6 |
Dem65 | 1.185697 .1713491 6.92 0.000 .8498588 1.521535
_cons | 2.978074 .3859332 7.72 0.000 2.221659 3.734489
------------+----------------------------------------------------------------
y7 |
Dem65 | 1.279513 .1603798 7.98 0.000 .965174 1.593851
_cons | 6.196264 .3771994 16.43 0.000 5.456967 6.935561
------------+----------------------------------------------------------------
y8 |
Dem65 | 1.265948 .1632167 7.76 0.000 .9460489 1.585847
_cons | 4.04339 .3713215 10.89 0.000 3.315613 4.771166
--------------+----------------------------------------------------------------
var(e.x1)| .0815494 .0197151 .0507735 .1309798
var(e.x2)| .1198064 .0699663 .0381402 .3763369
var(e.x3)| .4667027 .0891824 .3209105 .6787294
var(e.y1)| 1.891395 .4687497 1.163661 3.074241
var(e.y2)| 7.372868 1.345726 5.15551 10.5439
var(e.y3)| 5.067469 .968278 3.484543 7.369471
var(e.y4)| 3.147905 .7557987 1.966308 5.039547
var(e.y5)| 2.350971 .4887454 1.564193 3.533494
var(e.y6)| 4.953964 .8954399 3.476133 7.060077
var(e.y7)| 3.431375 .7278048 2.264257 5.200086
var(e.y8)| 3.254083 .7070144 2.125632 4.981603
var(e.Dem60)| 3.956032 .944614 2.477488 6.31696
var(e.Dem65)| .1724814 .2203083 .0141096 2.108488
var(Ind60)| .4484374 .0867427 .3069383 .6551677
--------------+----------------------------------------------------------------
cov(e.y1,e.y5)| .6236715 .368993 1.69 0.091 -.0995414 1.346884
cov(e.y2,e.y4)| 1.313114 .6987846 1.88 0.060 -.056479 2.682706
cov(e.y2,e.y6)| 2.15286 .7262961 2.96 0.003 .7293463 3.576375
cov(e.y3,e.y7)| .7949629 .6211629 1.28 0.201 -.4224939 2.01242
cov(e.y4,e.y8)| .3482249 .4578603 0.76 0.447 -.5491648 1.245614
cov(e.y6,e.y8)| 1.356165 .5719805 2.37 0.018 .2351034 2.477226
-------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(35) = 38.13, Prob > chi2 = 0.3292
lavaan 0.6-8 ended normally after 68 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 31
Number of observations 75
Model Test User Model:
Test statistic 38.125
Degrees of freedom 35
P-value (Chi-square) 0.329
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ind60 =~
x1 1.000 0.670 0.920
x2 2.180 0.139 15.742 0.000 1.460 0.973
x3 1.819 0.152 11.967 0.000 1.218 0.872
dem60 =~
y1 1.000 2.223 0.850
y2 1.257 0.182 6.889 0.000 2.794 0.717
y3 1.058 0.151 6.987 0.000 2.351 0.722
y4 1.265 0.145 8.722 0.000 2.812 0.846
dem65 =~
y5 1.000 2.103 0.808
y6 1.186 0.169 7.024 0.000 2.493 0.746
y7 1.280 0.160 8.002 0.000 2.691 0.824
y8 1.266 0.158 8.007 0.000 2.662 0.828
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dem60 ~
ind60 1.483 0.399 3.715 0.000 0.447 0.447
dem65 ~
ind60 0.572 0.221 2.586 0.010 0.182 0.182
dem60 0.837 0.098 8.514 0.000 0.885 0.885
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y1 ~~
.y5 0.624 0.358 1.741 0.082 0.624 0.296
.y2 ~~
.y4 1.313 0.702 1.871 0.061 1.313 0.273
.y6 2.153 0.734 2.934 0.003 2.153 0.356
.y3 ~~
.y7 0.795 0.608 1.308 0.191 0.795 0.191
.y4 ~~
.y8 0.348 0.442 0.787 0.431 0.348 0.109
.y6 ~~
.y8 1.356 0.568 2.386 0.017 1.356 0.338
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 0.082 0.019 4.184 0.000 0.082 0.154
.x2 0.120 0.070 1.718 0.086 0.120 0.053
.x3 0.467 0.090 5.177 0.000 0.467 0.239
.y1 1.891 0.444 4.256 0.000 1.891 0.277
.y2 7.373 1.374 5.366 0.000 7.373 0.486
.y3 5.067 0.952 5.324 0.000 5.067 0.478
.y4 3.148 0.739 4.261 0.000 3.148 0.285
.y5 2.351 0.480 4.895 0.000 2.351 0.347
.y6 4.954 0.914 5.419 0.000 4.954 0.443
.y7 3.431 0.713 4.814 0.000 3.431 0.322
.y8 3.254 0.695 4.685 0.000 3.254 0.315
ind60 0.448 0.087 5.173 0.000 1.000 1.000
.dem60 3.956 0.921 4.295 0.000 0.800 0.800
.dem65 0.172 0.215 0.803 0.422 0.039 0.039
Upvotes: 2
Reputation: 1065
Assuming you are using haven to read this .dta file then you are using sample data that are not intended for use in analysis, which is clearly stated on this page: https://www.stata-press.com/data/r17/sem.html
Datasets used in the Stata documentation were selected to demonstrate how to use Stata. Some datasets have been altered to explain a particular feature. Do not use these datasets for analysis.
Unfortunately the sample data only include 15 of the 932 observations shown in this example
x <- haven::read_dta("https://www.stata-press.com/data/r17/sem_sm2.dta")
sum(sapply(x, length))
#> [1] 225
sum(nrow(x))
#> [1] 15
Upvotes: 3