Paul de Barros
Paul de Barros

Reputation: 1200

Correlation matrix for subset of columns from a multiple imputation dataset

I am using the mice package in R to do multiple imputations of a dataset with a large amount of missingness. There are variables in the raw dataset that are important for the imputation process, and for later analyses. However, I want to create a correlation matrix using cor() without including some of the variables. Normally, for a simple dataset x, cor(x[,3:7]) would yield the correlation matrix for columns 3 through 7. If x is a mids object created by the mice function, one would normally use with to perform a repeated analysis to create a mira object, and then use pool to create a mipo pooled outcomes object. However, the second element of with is supposed to be a formula that references the columns of the dataset, and that is not the kind of input that goes into cor(). If x is a mids object, cor(x[,3:7]) does not work, and neither does with(x, cor(x[,3:7])).

How can I created a pooled correlation matrix for a subset of the variables from a multiple imputation data set?

#reproducible example
x = data.frame(matrix(rnorm(100),10,10))  #create random data
x[9:10,] = NA #add missingness
x.mice = mice(x)  #make imputed data set
cor(x.mice[,3:7]) #doesn't work
with(x.mice, cor(x.mice[,3:7])) #doesn't work
with(x.mice[,3:7], cor()) #doesn't work

Upvotes: 2

Views: 1955

Answers (1)

George GL
George GL

Reputation: 39

I've had the same problem. The newly added package "miceadds" adds very useful functionality to the mice package.

Specifically, for your problem, look up the function micombine.cor which does inference for correlations and covariances for multiply imputed datasets.

Eg:

library(missForest)
library(mice)
library(miceadds)

#Get the data
data <- iris

#introduce missings
iris.mis <- prodNA(iris, noNA = 0.1)


#imputedata
imputed     <-mice(iris.mis, m = 5, maxit = 5, method = "pmm")

#correlations for the first three variables (package miceadds) 
correlations<- miceadds::micombine.cor(mi.res=iris.mis, variables = c(1:3))

#and because i am a psychologist and don't like scientific coding... 
old_school<-format(correlations$p, scientific=FALSE)
correlations["p_value"] <- NA; correlations$p_value <- old_school; 
correlations

Upvotes: 1

Related Questions