Reputation: 7
I want to match up to 3 controls for each of the case with three conditions:
Example data:
#> dat
# A tibble: 14 x 4
patid age gender status pracid eventdate
<dbl> <dbl> <chr> <chr> <dbl> <date>
1 1 10 M case 100 23-05-20
2 2 20 F case 200 12-01-20
3 3 44 M case 300 21-02-20
4 4 11 F case 100 14-01-20
5 111 12 M control 100 NA
6 222 11 M control 100 NA
7 333 8 M control 100 NA
8 444 12 F control 200 NA
9 555 11 M control 100 NA
10 666 22 F control 100 NA
11 777 21 F control 100 NA
12 888 18 M control 200 NA
13 999 21 M control 200 NA
14 1000 18 M control 100 NA
Expected outcome: For id = 1, the matched controls as below, and I just need select 3 controls randomly in the table below.
patid age Gender group pracid
111 12 M control 100
222 11 M control 100
333 8 M control 100
555 11 M control 100
I do not want two cases to share the same control. Every case needs to have unique controls(unique patid). I would like the final output to also tell me for each control which case it was matched (in the example below they were matched to patid 1) and I want the event date of the case to be copied into the controls too. to. E.g.
patid age gender group pracid matched_id match_eventdate
1 10 M case 100 1 23-05-20
111 12 M control 100 1 23-05-20
222 11 M control 100 1 23-05-20
333 8 M control 100 1 23-05-20
555 11 M control 100 1 23-05-20
I need the event date to be copied because I have other parts of the dataset where I need to check how many diseases were cases and controls diagnosed with after that event date (basically the event date is the index date for cases and controls).
Upvotes: 0
Views: 792
Reputation: 4414
This is straightforward using MatchIt
. below is the code you would use to performing the matching:
library(MatchIt)
m.out <- matchit(I(status == "case") ~ age, data = data,
exact = ~pracid + gender,
caliper = c(age = 3), std.caliper = FALSE,
distance = "euclidean", ratio = 3)
This does 3:1 nearest neighbor matching on age, ensuring that patients are exactly matched on pracid
and gender
and that all controls are within 3 years of age of their matched case.
Next we extract the matched dataset using match.data()
:
m.data <- match.data(m.out, subclass = "matched_id")
Finally, we will re-order the dataset and fill in the missing event dates:
m.data <- m.data[with(m.data, order(matched_id, status, patid)),]
m.data$match_eventdate <- m.data$eventdate
for (i in levels(m.data$matched_id)) {
in_i <- which(m.data$matched_id == i)
m.data$match_eventdate[in_i] <- na.omit(m.data$eventdate[in_i])
}
You can examine the matched sets either by printing the m.data
object, which will look close to what you specified above, or by examining m.out$match.matrix
, which identifies which controls are matched to each case.
Note that if any case does not receive any controls, it will be dropped from the matched dataset. If it receives 1 or 2 controls, it will remain in the dataset, but the matched controls will have weights associated with them that you must include when estimating the effect. If you don't want any cases that have fewer than 3 controls, there is no way to remove them in matchit()
, but you can drop them from the dataset using the following:
subclass_3 <- levels(m.data$matched_id)[table(m.data$matched_id) == 3]
m.data <- m.data[m.data$matched_id %in% subclass_3,]
Upvotes: 1