Reputation: 1119
I'm using the MatchIt package on a system with 128GB of RAM.
Firstly, my data does not have any NAs. My first effort, using a generalised linear model (defaults to logistic regression) and a "nearest neighbour" worked:
headache6MontsMatch1 <- matchit(Headache_past_six_months ~ sex + age + townsend + alcohol + smoking, method="nearest", distance="glm", data=reducedDF)
But, from approx 100,000 records, I lost approx 30,000 from the matching. I would like to try an optimum "full" method.
headache6MontsMatch2 <- matchit(Headache_past_six_months ~ sex + age + townsend + alcohol + smoking, method="full", link="probit", distance="glm", data=reducedDF)
Unfortunately, this throws the error:
NAs produced by integer overflowError in if ((nc * nr > getMaxProblemSize()) && warning.requested) { :
missing value where TRUE/FALSE needed
Looking further into getMaxProblemSize()
, it appears as though I'm restricted to a hard limit for matching. So I've tried:
setMaxProblemSize()
Then double checking the problem size with getMaxProblemSize
yields Inf.
But I'm still running into the same problem. My machine sits comfortably at around 56GB of RAM out of 128GB, the CPU is only being drained at 6% and the disk is not really being touched.
Upvotes: 0
Views: 425
Reputation: 4414
This is a kind of funny error and has nothing to do with MatchIt
. It has to do with the fact that R cannot represent large numbers as integers.
I assume you have approximately 35000 treated and 65000 control units. optmatch
computes the problem size as nc * nr
, where nc
is the number of control and nr
is the number of treated. optmatch
stores these numbers as integers because they are the dimensions of a distance matrix used internally. With nr = 35000
and nc = 65000
, nc * nr
is a very large number. R cannot represent numbers that large as integers (see here) and produces NA
for this value instead. Because NA
cannot be used in an if
statement, the error is thrown.
There is no solution to this problem except to use a smaller sample or ask the optmatch
developers to fix this bug. They could easily fix this by converting nc
and nr
to double values before computing nc * nr
.
Edit 8/21/21: I contacted the optmatch
maintainers and they fixed this issue. It will be corrected in the upcoming version of optmatch
.
Upvotes: 1