Reputation: 15
I am trying to impute the missing values of C1-C3 variables of a large dataset using mice
package. That has worked so far. The problem arises when I am trying to use the gWQS
package to conduct mixtures effect X1-X4 chemicals.
I have tried imputing the missing values of my covariates using mice
package then I have tried using the imputed dataframe in the gWQS
package to conduct WQS regression. However, my code is not accepted as imp$imp is a list. I have also tried the miWQS
package however that package has limitations with imputation methods that I do not want to use.
Original dataset comprises of Y as continuous outcome X1-X4 as continuous measures of exposure and C1-C3 as covariates that were imputed with mice
.
Imputation model using mice
imp <- mice::mice(originaldf,m=2, meth=meth, pred=pred,
seed=51162,visitSequence="monotone", pri=FALSE)
toxic_chems=c("X1" , "X2", "X3", "X4")
set.seed(2019)
library("gWQS")
gwqs(Y ~ C1 C2 C3, mix_name=toxic_chems, data=imp$imp,
q=4, validation=0.8, valid_var=NULL, b=10, b1_pos=F, b1_constr=F,
family="gaussian", seed=2019, wqs2=T, plots=T, tables=T)
Error:
Error in .check.function(formula, mix_name, data, q, validation, valid_var, :
data must be a data.frame
Upvotes: 1
Views: 189
Reputation: 73397
As you've already noticed, mice()
yields a list, namely a list of all your variables with their imputations, in your case two imputations, since you've chosen m=2
. That's how multiple imputation works. Here an example with nhanes
data included into mice
:
imp <- mice::mice(nhanes, m=2)
imp$imp
# $age
# [1] 1 2
# <0 rows> (or 0-length row.names)
#
# $bmi
# 1 2
# 1 30.1 25.5
# 3 27.2 28.7
# 4 20.4 24.9
# [...]
#
# $hyp
# 1 2
# 1 1 1
# 4 1 2
# 6 1 2
# [...]
#
# $chl
# 1 2
# 1 187 187
# 4 131 186
# 10 229 187
# [...]
If you'd use OLS, the standard way is to fit a model over this list, and pool
the results. mice
then is using the lm.mids
method included in the package.
fit <- with(data=imp, exp=lm(bmi ~ age + hyp + chl))
pool(fit)
pool(fit)$pooled[, 1:5] # shortened
# estimate ubar b t dfcom
# (Intercept) 20.28615169 1.354978e+01 6.556134e+00 2.338398e+01 21
# age -3.01670128 1.081655e+00 1.238383e-03 1.083512e+00 21
# hyp 1.89935232 4.074904e+00 2.092851e+00 7.214181e+00 21
# chl 0.04517373 3.813968e-04 5.113178e-06 3.890666e-04 21
And this is the point where you run into a problem, because there exists no gwqs.mids
method (but there is a glm.mids
method), and you probably need to write it yourself, or ask one of the package authors.
However, there is a complete()
function included in mice
, which yields a "data.frame"
, with which you also could do pooled calculations. It should be used with care, though, i.e. using everything else than the "long"
format (i.e. just one single imputation) would be very wrong.
complete(imp, "long")
# .imp .id age bmi hyp chl
# 1 1 1 1 30.1 1 187
# 2 1 2 2 22.7 1 187
# 3 1 3 1 27.2 1 187
# [...]
# 26 2 1 1 25.5 1 187
# 27 2 2 2 22.7 1 187
# 28 2 3 1 28.7 1 187
# [...]
class(complete(imp, "long"))
# [1] "data.frame"
The ".imp"
variable now indicates the number of the imputation, and you could calculate your gwqs
model for each subset of ".imp"
indicators.
To pool the results now, you'd have to consider between and within variances (see Rubin 1987:76).
To elaborate further on this, though, would go too far for Stack Overflow. If you don't know how to do this, you'd need to consult a statistician, or ask at Cross Validated how to do that.
At least this would be a way to use mice
and gWQS
together.
Upvotes: 1