Anthony Nash
Anthony Nash

Reputation: 1119

MatchIt (propensity score matching package for R) throws errors concerning NAs produced

I'm using the MatchIt package on a system with 128GB of RAM.

Firstly, my data does not have any NAs. My first effort, using a generalised linear model (defaults to logistic regression) and a "nearest neighbour" worked:

headache6MontsMatch1 <- matchit(Headache_past_six_months ~ sex + age + townsend + alcohol + smoking, method="nearest", distance="glm", data=reducedDF)

But, from approx 100,000 records, I lost approx 30,000 from the matching. I would like to try an optimum "full" method.

headache6MontsMatch2 <- matchit(Headache_past_six_months ~ sex + age + townsend + alcohol + smoking, method="full", link="probit", distance="glm", data=reducedDF) 

Unfortunately, this throws the error:

NAs produced by integer overflowError in if ((nc * nr > getMaxProblemSize()) && warning.requested) { : 
  missing value where TRUE/FALSE needed

Looking further into getMaxProblemSize(), it appears as though I'm restricted to a hard limit for matching. So I've tried:

setMaxProblemSize()

Then double checking the problem size with getMaxProblemSize yields Inf.

But I'm still running into the same problem. My machine sits comfortably at around 56GB of RAM out of 128GB, the CPU is only being drained at 6% and the disk is not really being touched.

Upvotes: 0

Views: 425

Answers (1)

Noah
Noah

Reputation: 4414

This is a kind of funny error and has nothing to do with MatchIt. It has to do with the fact that R cannot represent large numbers as integers.

I assume you have approximately 35000 treated and 65000 control units. optmatch computes the problem size as nc * nr, where nc is the number of control and nr is the number of treated. optmatch stores these numbers as integers because they are the dimensions of a distance matrix used internally. With nr = 35000 and nc = 65000, nc * nr is a very large number. R cannot represent numbers that large as integers (see here) and produces NA for this value instead. Because NA cannot be used in an if statement, the error is thrown.

There is no solution to this problem except to use a smaller sample or ask the optmatch developers to fix this bug. They could easily fix this by converting nc and nr to double values before computing nc * nr.


Edit 8/21/21: I contacted the optmatch maintainers and they fixed this issue. It will be corrected in the upcoming version of optmatch.

Upvotes: 1

Related Questions