Reputation: 1200
I am using the mice
package in R
to do multiple imputations of a dataset with a large amount of missingness. There are variables in the raw dataset that are important for the imputation process, and for later analyses. However, I want to create a correlation matrix using cor()
without including some of the variables. Normally, for a simple dataset x
, cor(x[,3:7])
would yield the correlation matrix for columns 3 through 7. If x
is a mids
object created by the mice
function, one would normally use with
to perform a repeated analysis to create a mira
object, and then use pool
to create a mipo
pooled outcomes object. However, the second element of with
is supposed to be a formula that references the columns of the dataset, and that is not the kind of input that goes into cor()
. If x
is a mids
object, cor(x[,3:7])
does not work, and neither does with(x, cor(x[,3:7]))
.
How can I created a pooled correlation matrix for a subset of the variables from a multiple imputation data set?
#reproducible example
x = data.frame(matrix(rnorm(100),10,10)) #create random data
x[9:10,] = NA #add missingness
x.mice = mice(x) #make imputed data set
cor(x.mice[,3:7]) #doesn't work
with(x.mice, cor(x.mice[,3:7])) #doesn't work
with(x.mice[,3:7], cor()) #doesn't work
Upvotes: 2
Views: 1955
Reputation: 39
I've had the same problem. The newly added package "miceadds" adds very useful functionality to the mice package.
Specifically, for your problem, look up the function micombine.cor which does inference for correlations and covariances for multiply imputed datasets.
Eg:
library(missForest)
library(mice)
library(miceadds)
#Get the data
data <- iris
#introduce missings
iris.mis <- prodNA(iris, noNA = 0.1)
#imputedata
imputed <-mice(iris.mis, m = 5, maxit = 5, method = "pmm")
#correlations for the first three variables (package miceadds)
correlations<- miceadds::micombine.cor(mi.res=iris.mis, variables = c(1:3))
#and because i am a psychologist and don't like scientific coding...
old_school<-format(correlations$p, scientific=FALSE)
correlations["p_value"] <- NA; correlations$p_value <- old_school;
correlations
Upvotes: 1