Reputation: 1797
Using mirt
package I obtained (possibly) odd results for my nominal model.
library(difNLR)
library(mirt)
data("GMATtest", "GMATkey")
key <- as.numeric(as.factor(GMATkey))
data <- sapply(1:20, function(i) as.numeric(GMATtest[, i]))
colnames(data) <- paste("Item", 1:ncol(data))
scoredGMAT <- key2binary(data, key)
# 2PL IRT model for scored data
mod0 <- mirt(scoredGMAT, 1)
# nominal model for unscored data
mod1 <- mirt(data, 1, 'nominal')
# plots of characteristic curves for item 1
itemplot(mod0, 1)
itemplot(mod1, 1)
I expected that for the nominal model mod1
there will be one curve very similar to the correct answer as plotted for my mod0
. However, it seems that distractors have increasing probability with increasing theta, which seems not really reasonable. Of course, there can be something wrong with data or (more probably) I'm missing something..
I have already checked examples in mirt
help and results are as I expected.
Any suggestions (what may be wrong) would be appreciated!
One last thing - I also tried to fit 2PLNRM
model but my R session aborted. Anybody noticed same issue? My code:
# 2PLNRM model
mod2 <- mirt(data, 1, "2PLNRM", key = key)
coef(mod2)$`Item 1`
itemplot(mod2, 1)
EDIT:
There is an example from mirt
package:
library(mirt)
data(SAT12)
SAT12[SAT12 == 8] <- NA #set 8 as a missing value
head(SAT12)
# correct answer key
key <- c(1, 4, 5, 2, 3, 1, 2, 1, 3, 1, 2, 4, 2, 1, 5, 3, 4, 4, 1, 4, 3,
3, 4, 1, 3, 5, 1, 3, 1, 5, 4, 5)
scoredSAT12 <- key2binary(SAT12, key)
mod0 <- mirt(scoredSAT12, 1)
# for first 5 items use 2PLNRM and nominal
scoredSAT12[, 1:5] <- as.matrix(SAT12[, 1:5])
mod1 <- mirt(scoredSAT12, 1, c(rep('nominal', 5), rep('2PL', 27)))
coef(mod0)$Item.1
coef(mod1)$Item.1
itemplot(mod0, 1)
itemplot(mod1, 1)
And the results are what I expected, however, when I try to fit nominal
model for all items, curves changed:
# nominal for all items
mod1 <- mirt(SAT12, 1, 'nominal')
coef(mod1)$Item.1
itemplot(mod1, 1)
So, as you suggested, it seems that theta and its interpretation changed, but why and how?
Upvotes: 1
Views: 749
Reputation: 747
@Juan Bosco is correct that this behaviour is consistent. The issue with using the nominal response model for all items is that the direction of an increasing $\theta$ value is not obvious in the model because it's direction is arbitrary (the items are 'unordered' by default, after all).
Moreover, because of mirt
's default parameterisation, which assumes that the lowest/highest numerical category should be associated with low/high $\theta$ values, this type of flipping is common in multiple choice-type items (where, unlike rating scale ordered data, there should be no direct relationship) because the model will pick the orientation that best matches with these identification constraints.
To fix this, simply redefine the scoring constraints used by mirt
by replacing the highest fixed scoring coefficient to the actual scoring key provided. Like so:
#starting values data.frame
sv <- mirt(data, 1, 'nominal', pars = 'values')
head(sv)
# set all values to 0 and estimated
sv$value[grepl('ak', sv$name)] <- 0
sv$est[grepl('ak', sv$name)] <- TRUE
nms <- colnames(data)
for(i in 1:length(nms)){
#set highest category based on key fixed to 3
pick <- paste0('ak', key[i]-1)
index <- sv$item == nms[i] & pick == sv$name
sv[index, 'value'] <- 3
sv[index, 'est'] <- FALSE
# set arbitrary lowest category fixed at 0
if(pick == 'ak0') pick2 <- 'ak3'
else pick2 <- paste0('ak', key[i]-2)
index2 <- sv$item == nms[i] & pick2 == sv$name
sv[index2, 'est'] <- FALSE
}
#estimate
mod2 <- mirt(data, 1, 'nominal', pars=sv)
plot(mod2, type = 'trace')
itemplot(mod2, 1)
coef(mod2, simplify=TRUE)
At the very least, this informs the model which category is the highest, and therefore provides enough information to finish with a more appropriate orientation. Note that it really doesn't affect the interpretation of the model per say, because all that happens is the slopes are multiplied by -1 and the scoring coefs are adjusted accordingly. HTH.
Upvotes: 1
Reputation: 1797
Well, as suggested by Juan, the problem is that estimate of theta is changed when using different IRT model. Moreover there is some connection between the estimates by 2PL
and nominal
model.
library(difNLR)
library(mirt)
data("GMATtest", "GMATkey")
key <- as.numeric(as.factor(GMATkey))
data <- sapply(1:20, function(i) as.numeric(GMATtest[, i]))
colnames(data) <- paste("Item", 1:ncol(data))
scoredGMAT <- key2binary(data, key)
# 2PL IRT model for scored data
mod0 <- mirt(scoredGMAT, 1)
# nominal model for unscored data
mod1_all <- mirt(data, 1, 'nominal')
# nominal model for only first item
df <- data.frame(data[, 1], scoredGMAT[, 2:20])
mod1_1 <- mirt(df, 1, c('nominal', rep('2PL', 19)))
# plots of characteristic curves for item 1
itemplot(mod0, 1)
itemplot(mod1_all, 1)
itemplot(mod1_1, 1)
# factor scores
fs0 <- fscores(mod0)
fs1_all <- fscores(mod1_all)
fs1_1 <- fscores(mod1_1)
plot(fs1_all ~ fs0)
plot(fs1_1 ~ fs0)
# linear model
round(coef(lm(fs1_all ~ fs0)), 4)
(Intercept) fs0
-0.0001 -0.9972
This seems that new theta is sth like 'ignoration' rather than 'knowledge', as it's almost minus original theta.
Thank you Juan for your ideas, they were really helpful!
Upvotes: 0