Lesimster
Lesimster

Reputation: 13

Can I conduct pooled regression analysis on only a subsample of a dataset imputed with MICE in R?

I conducted multiple imputation using the 'mice' package in R. Afterwards, I calculated pooled regression analyses using the 'with' and 'pool' functions.

For further analyses, I only want to look at a specific subsample of the data. I would like to use the imputed data with pooled regression analysis for that aswell.

However, I am struggling to find a way to achieve that. That is because pooled regression analysis in 'mice' works by using the 'with' and 'lm' function on a object of class 'mids', instead of just calling 'lm' on a dataframe. Therefore, I can't just subset the data by conventional means, such as using square brackets or the 'subset' function.

I know that I could theoretically just extract the imputed datasets using the 'complete' function, conduct regression analyses on these datasets, and then pool the results by hand, but I would like to avoid that.

An example of what I want to do would be:

library(mice)

data <- as.data.frame(matrix(data = c(3, 2, 3, 4, 5, NA, 7, 10, 9, NA, NA, 12, 13, 14, 15, 16, NA, 18), nrow = 6))
names(data) <- c("a", "b", "c")
data$Sex <- c("male", "male", "female", "male", "female", "female")

imp <- mice(data = data,
            m = 20,
            maxit = 10,
            seed = 12,
            print = FALSE)

Now, I can conduct pooled regression analysis by using:

summary(pool(with(imp, lm(a ~ b + c))))

What I am struggling to achieve is conducting a regression analysis on only the male subjects.

Upvotes: 1

Views: 459

Answers (2)

lhs
lhs

Reputation: 1038

You can use the subset = argument of the lm() function directly:

summary(pool(with(imp, lm(a ~ b + c, subset =  Sex == "male"))))

Upvotes: 0

jpsmith
jpsmith

Reputation: 17550

mice returns an object of class mids, which can be subsetted with a boolean vector using filter:

filter(imp, Sex %in% "male")

# or for more detail
imp_filtered <- filter(imp, Sex %in% "male")
imp_filtered$data

#  a  b  c  Sex
#1 3  7 13 male
#2 2 10 14 male
#4 4 NA 16 male

So to implement this, you can save a filtered object or modify your code slightly:

# save filtered data to new object

imp_filtered <- filter(imp, Sex %in% "male")
summary(pool(with(imp_filtered, lm(a ~ b + c))))

# or all in one go

summary(pool(with(filter(imp, Sex %in% "male"), lm(a ~ b + c))))

Upvotes: 1

Related Questions