Reputation: 31
I am attempting to construct a structural equation model in R for the relationships between latent variables "aptitude" and "faculty/curriculum effectiveness," in a set of de-identified medical education data. In an attempt to preserve as much of the data as possible, I want to include the test scores of all of the exams that a medical student takes in "blocks" in the first two years of medical school (denoted MS1 and MS2). Each block of exams covers different category material and has a different number of exams. Eventually, this would lead to a larger structural model assesing the relationship between the latent variables above and the USMLE STEP1 qualification exam which assesses all of med school years 1 and 2, with the hope of itendifying which blocks have a stronger relationship with the scores on this exam, mediated by the other latent variables. For ease, each exam in the data.frame all.exams is specified by which block and in what order in the block it was taken:
head(all.exams)
BLK1.1 BLK1.2 BLK1.3 BLK1.4 BLK2.1 BLK2.2 BLK2.3 BLK3.1 BLK3.2
66 66.7 87.8 50.0 82.4 81.8 100.0 87.2 83.3 69.7
67 100.0 95.9 100.0 97.1 100.0 100.0 94.9 100.0 100.0
68 100.0 91.9 66.7 88.2 100.0 100.0 94.9 91.7 97.0
69 100.0 93.2 83.3 95.6 81.8 100.0 97.4 95.8 93.9
70 100.0 89.2 83.3 85.3 100.0 100.0 87.2 87.5 100.0
71 91.7 90.5 83.3 88.2 90.9 83.3 94.9 95.8 100.0
BLK3.3 BLK4.1 BLK4.2 BLK5.1 BLK5.2 MS2BLK1.1 MS2BLK1.2 MS2BLK1.3
66 81.3 100 80.3 90.5 100.0 85 81 82
67 95.8 100 94.4 100.0 100.0 99 98 96
68 87.5 100 87.3 81.0 66.7 90 93 93
69 89.6 100 88.7 100.0 100.0 93 84 90
70 85.4 100 85.9 90.5 100.0 97 87 88
71 87.5 100 90.1 95.2 100.0 95 89 89
MS2BLK1.4 MS2BLK2.1 MS2BLK2.2 MS2BLK2.3 MS2BLK3.1 MS2BLK3.2
66 90.8 82 74.3 89.3 78.4 80.0
67 100.0 95 100.0 98.7 99.2 95.2
68 95.4 94 95.7 93.3 95.2 95.2
69 95.4 91 97.1 93.3 84.8 92.0
70 93.9 94 92.9 94.7 85.6 82.4
71 95.4 94 92.9 93.3 92.0 92.0
MS2BLK4.1 MS2BLK4.2 MS2BLK4.3 MS2BLK5.1 MS2BLK5.2 MS2BLK5.3 STEP1
66 75.6 80.3 82.3 82.4 74 93 193
67 97.5 93.8 97.5 100.0 100 99 251
68 89.9 95.1 84.8 93.6 94 93 242
69 85.7 92.6 91.1 88.0 91 95 226
70 82.4 81.5 92.4 90.4 94 93 233
71 89.9 88.9 83.5 96.0 97 90 231
This is an ideal data-set to which to apply "item-parceling," since we are more interested in the factor-loadings between the latent variables and each "block" of exams as opposed rather than the relationship between each individual exam and each latent variable.
semTools features a function parcelAllocation
https://www.rdocumentation.org/packages/semTools/versions/0.4-12/topics/parcelAllocation
which allows the user to combine manifest variables in a SEM into a specified number of parcels per latent variable and with a specified number of items within each parcel. According to the example included with the notes on semTools, the item syntax should look like:
item.syntax.full <- c(paste0("faculty =~ BLK1.", 1:4),
paste0("faculty =~ BLK2.", 1:3),
paste0("faculty=~BLK3.",1:3),
paste0("faculty=~BLK4.",1:2),
paste0("faculty=~BLK5.",1:2),
paste0("faculty=~MS2BLK1.",1:4),
paste0("faculty=~MS2BLK2.",1:3),
paste0("faculty=~MS2BLK3.",1:2),
paste0("faculty=~MS2BLK4.",1:3),
paste0("faculty=~MS2BLK5.",1:3),
paste0("aptitude =~ BLK1.", 1:4),
paste0("aptitude =~ BLK2.", 1:3),
paste0("aptitude=~BLK3.",1:3),
paste0("aptitude=~BLK4.",1:2),
paste0("aptitude=~BLK5.",1:2),
paste0("aptitude=~MS2BLK1.",1:4),
paste0("aptitude=~MS2BLK2.",1:3),
paste0("aptitude=~MS2BLK3.",1:2),
paste0("aptitude=~MS2BLK4.",1:3),
paste0("aptitude=~MS2BLK5.",1:3)
)
The lavaan syntax/style model is specified by the code:
parcel.model="
faculty=~par1+par2+par3+par4+par5+par6+par7+par8+par9par10
aptitude=~par11+par12+par13+par14+par15+par16+par17+par18+par19+par20
"
Using semTools parcelAllocation function, the following code should fit a lavaan type structural equation model with two latent variables, and ten parcels containing the number of manifest items/variables specified by the nPerPar command in the function:
parcelAllocation(model=parcel.model,dataset=all.exams[,-30],nPerPar = list(c(4,3,3,2,2,4,3,2,3,3),c(4,3,3,2,2,4,3,2,3,3)),facPlc = list(apt.names,fac.names),nAlloc=20,syntax=item.syntax.full)
where,
fac.names=colnames(all.exams)
fac.names=c("faculty",fac.names[-30])
apt.names=colnames(all.exams)
apt.names=c("aptitude",apt.names[-30])
####the names of the latent variables and all of the manifest variables to be parceled- we exclude "STEP1" because it is not included in the lavaan model or the item.syntax####
However, when I run the above code I get the following error message:
Error in parcelAllocation(model = parcel.model, dataset = all.exams, nPerPar = list(c(4, :
** WARNING! ** Parcels incorrectly specified. Check input!
I have tried creating a simpler structural model, with 3 parcels per latent variable and with 3, 3 and 4 items respectively per parcel (totaling to the number of exams in the first two years of medical school (10) prior to the STEP1 examination):
parcel.model.simp="
faculty=~par1+par2+par3
aptitude=~par4+par5+par6
"
and using the appropriately adjusted parcelAllocation code:
parcelAllocation(model=parcel.model.simp,dataset=all.exams[,-30],nPerPar = list(c(3,3,4),c(3,3,4)),facPlc = list(apt.names,fac.names),nAlloc=20,syntax=item.syntax.full)
but it only yields the same error message as above:
How can I get this function to effectively parcel the exams into parcels corresponding to each block? What errors seem present in my code? Any suggestions or corrections to the parcelAllocation code or critical feedback on my SEM approach to this question in general would be extremely helpful. I have exhaustively searched for examples of parceling that are this complex and troubleshooting for this error message and have found neither.
Thank you,
David
Upvotes: 3
Views: 1848
Reputation: 1250
I'm not sure how helpful this will be after over a year since the OP, but I find it hard to provide the requested help because the description does not seem to match the purpose of parcelAllocation()
. You state that you want to "parcel the exams into parcels corresponding to each block", which means you do not want random parcels at all (which is what this function is for). Instead, you should simply create those parcels as new variables in your data by calculating a composite score (the sum or mean across exams within a block), then use those composites as indicators.
Some other problems:
parcelAllocation()
function, which has features that were not available in the documentation page you linked to (version 0.4-12). Perhaps it will be clearer how the function works if you work through the examples in the updated documentation (version 0.5-2):https://www.rdocumentation.org/packages/semTools/versions/0.5-2/topics/parcelAllocation
But again, it does not sound like you want to randomly allocate parcels, but rather to assign exams within a block to a parcel for that block. See this article about the advantages of "facet-representative parceling", which in your case would include all the within-block sources of variance in a single parcel, thus excluding those sources from the construct variance.
Good luck,
Terrence D. Jorgensen
Upvotes: 0