Reputation: 1
Problem Introduction Hi everyone,
I’m working with a clean dataset of N = 724 participants who completed a personality test based on the HEXACO model. The test is designed to measure 24 sub-components that combine into 6 main personality traits, with around 15-16 questions per sub-component.
I'm performing a Confirmatory Factor Analysis (CFA) to validate the constructs, but I’ve encountered a significant issue: my data strongly deviates from multivariate normality (HZ = 1.000, p < 0.001). This deviation suggests that a standard CFA approach won’t work, so I need an estimator that can handle non-normal data. I’m using lavaan::cfa() in R for the analysis.
From my research, I found that Maximum Likelihood Estimation with Robustness (MLR) is often recommended for such cases. However, since I’m new to this, I’d appreciate any advice on whether MLR is the best option or if there are better alternatives. Additionally, my model has trouble converging, which makes me wonder if I need a different estimator or if there’s another issue with my approach.
Data details The response scale ranges from -5 to 5. Although ordinal data (like Likert scales) is usually treated as non-continuous, I’ve read that when the range is wider (e.g., -5 to 5), treating it as continuous is sometimes appropriate. I’d like to confirm if this is valid for my data.
During data cleaning, I removed participants who displayed extreme response styles (e.g., more than 50% of their answers were at the scale’s extremes or at the midpoint).
In summary, I have two questions:
Is MLR the best estimator for CFA when the data violates multivariate normality, or are there better alternatives?
Given the -5 to 5 scale, should I treat my data as continuous, or would it be more appropriate to handle it as ordinal?
Thanks in advance for any advice!
Once again, I’m running a CFA using lavaan::cfa() with estimator = "MLR", but the model has convergence issues.
Model Call The model call:
first_order_fit <- cfa(first_order_model,
data = final_model_data,
estimator = "MLR",
verbose = TRUE)
Model Syntax The syntax for the "first_order_model" follows the lavaan style definition:
first_order_model <- '
a_flexibility =~ Q239 + Q274 + Q262 + Q183
a_forgiveness =~ Q200 + Q271 + Q264 + Q222
a_gentleness =~ Q238 + Q244 + Q272 + Q247
a_patience =~ Q282 + Q253 + Q234 + Q226
c_diligence =~ Q267 + Q233 + Q195 + Q193
c_organization =~ Q260 + Q189 + Q275 + Q228
c_perfectionism =~ Q249 + Q210 + Q263 + Q216 + Q214
c_prudence =~ Q265 + Q270 + Q254 + Q259
e_anxiety =~ Q185 + Q202 + Q208 + Q243 + Q261
e_dependence =~ Q273 + Q236 + Q279 + Q211 + Q204
e_fearfulness =~ Q217 + Q221 + Q213 + Q205
e_sentimentality =~ Q229 + Q251 + Q237 + Q209
h_fairness =~ Q277 + Q192 + Q219 + Q203
h_greed_avoidance =~ Q188 + Q215 + Q255 + Q231
h_modesty =~ Q266 + Q206 + Q258 + Q207
h_sincerity =~ Q199 + Q223 + Q225 + Q240
o_aesthetic_appreciation =~ Q196 + Q268 + Q281
o_creativity =~ Q212 + Q191 + Q194 + Q242 + Q256
o_inquisitivness =~ Q278 + Q246 + Q280 + Q186
o_unconventionality =~ Q227 + Q235 + Q250 + Q201
x_livelyness =~ Q220 + Q252 + Q276 + Q230
x_sociability =~ Q218 + Q224 + Q241 + Q232
x_social_boldness =~ Q184 + Q197 + Q190 + Q187 + Q245
x_social_self_esteem =~ Q198 + Q269 + Q248 + Q257
'
Note I did not assign any starting value or fixed any of the covariances.
Convergence Status The relative convergence (4) status indicates that after 4 attempts (2439 iterations), the model reached a solution but it was not stable. In my case, the model keeps processing endlessly:
convergence status (0=ok): 0 nlminb message says: relative convergence (4) number of iterations: 2493 number of function evaluations [objective, gradient]: 3300 2494 lavoptim ... done. lavimplied ... done. lavloglik ... done. lavbaseline ...
Sample Data You can generate similar data using this code:
set.seed(123)
n_participants <- 200
n_questions <- 100
sample_data <- data.frame(
matrix(
sample(-5:5, n_participants * n_questions, replace = TRUE),
nrow = n_participants,
ncol = n_questions
)
)
colnames(sample_data) <- paste0("Q", 183:282)
Assumption of multivariate normality
To test for multivariate normality, I used: mvn_result <- mvn(data = sample_data, mvnTest = "mardia", multivariatePlot = "qq")
For a formal test: mvn_result_hz <- mvn(data = final_model_data, mvnTest = "hz")
Upvotes: 0
Views: 60
Reputation: 690
So the principal issue you want to solve is the nonconvergence, right?
A common approach with complex statistical models is to (1.) vary options passed to the optimizier and/or (2.) try other optimiziers (i.e. fitting algorithms).
The default optimizer in lavaan::cfa()
is "nlminb" and you can change its default options by adding a list of options to your call to cfa
like this:
first_order_fit <- cfa(first_order_model,
data = final_model_data,
estimator = "MLR",
verbose = TRUE,
control = list(iter.max = 500,
rel.tol = 1e-9))
See the nlminb help page to find out what options you have and what their defaults are. My first try would be to increase iter.max
and maybe slightly increase rel.tol
(i.e. make it a slightly less tiny number), as shown above.
If you cannot get convergence (or do get convergence but the estimates are absurd), then you may want to try using another optimizer.
To do this, try the following code:
first_order_fit <- cfa(first_order_model,
data = final_model_data,
estimator = "MLR",
verbose = TRUE,
optim.method = "BFGS", # or "L-BFGS-B"
)
Again, you can try different options for these optimiziers by adding a control = list(...)
. Be aware that the options available and their names may differ.
Upvotes: 0