Reputation: 1711
I want to match 2 controls
for every case
with two conditions:
the age
difference should between ±2;
the income
difference should between ±2.
If there are more than 2 controls
for a case
, I just need to select 2 controls
randomly. And then, how do I generate a new variable that indicates the control that each case
matches? For example, Control1
and Control2
matched by Case1
are encoded as group 1
, and Control1
and Control2
matched by Case2
are encoded as group 2.
dat = structure(list(id = c(1, 2, 3, 4, 111, 222, 333, 444, 555, 666,
777, 888, 999, 1000),
age = c(10, 20, 44, 11, 12, 11, 8, 12, 11, 22, 21, 18, 21, 18),
income = c(35, 72, 11, 35, 37, 36, 33, 70, 34, 74, 70, 44, 76, 70),
group = c("case", "case", "case", "case", "control", "control",
"control", "control", "control", "control", "control",
"control", "control", "control")),
row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))
id | age | income | group | index |
---|---|---|---|---|
1 | 10 | 35 | case | 1 |
2 | 20 | 72 | case | 2 |
3 | 44 | 11 | case | 3 |
4 | 11 | 35 | case | 4 |
111 | 12 | 37 | control | 1 |
222 | 11 | 36 | control | 1 |
333 | 8 | 33 | control | 4 |
555 | 11 | 34 | control | 4 |
777 | 21 | 70 | control | 2 |
1000 | 18 | 70 | control | 2 |
This is similar to my previous question, but I want the output to have an extra variable called index
to indicate the specific controls for case matching. If a case
and a control
have the same index
, it means that specific controls is matched with that case.
The question is how can I create the index
, preferably with an approach based on the previous question.
Upvotes: 2
Views: 402
Reputation: 6769
This is based on the accepted answer to your previous post by @AnilGoyal:
library(dplyr, warn.conflicts = F)
dat %>% mutate(index=0) %>%
split(.$group) %>%
list2env(envir = .GlobalEnv)
set.seed(12345)
for(i in seq_len(nrow(case))){
x <- which(between(control$age, case$age[i] -2, case$age[i] +2) &
between(control$income, case$income[i] -2, case$income[i] + 2) &
control$index==0)
control$index[sample(x, min(2, length(x)))] <- i
case$index[i] <-i
}
matched <- case %>% rbind(control) %>% filter(index >0)
matched
Please note: You have more than 2 controls meeting the criteria for some cases, 2 controls are randomly selected.
> matched
# A tibble: 10 × 5
id age income group index
<dbl> <dbl> <dbl> <chr> <dbl>
1 1 10 35 case 1
2 2 20 72 case 2
3 3 44 11 case 3
4 4 11 35 case 4
5 111 12 37 control 4
6 222 11 36 control 1
7 333 8 33 control 1
8 555 11 34 control 4
9 777 21 70 control 2
10 1000 18 70 control 2
Upvotes: 2