Reputation: 2385
I would like to select n rows from y that matches the strings in x where n= length of x, but the same row in y should not be selected more than one time. The rows should be selected randomly from y.
> head(x$Age_Yrs_Sex)
[1] "65_0" "72_1" "82_0" "52_0" "81_0" "58_0"
> head(y,20)
ID Age_Yrs_Sex
1 10678800017 30_0
2 106788000024 63_0
4 10678800048 59_0
5 1067880000055 68_1
7 1067800079 59_0
8 10678800086 36_1
10 10678000109 39_0
12 1067880123 42_0
13 10678800130 45_1
14 106788000147 49_1
15 1067880000154 24_0
16 106780000161 44_0
17 1067880000178 43_1
19 106780000192 79_0
20 106880000208 22_0
22 107880000222 89_0
23 167880000239 28_0
24 106788000246 44_1
25 106780000253 76_0
26 106780000260 45_1
Upvotes: 1
Views: 67
Reputation: 9570
Assuming that the entries in x
are always less than those in y
for a given match, this should work (using dplyr
). Generating usable example data here:
y <-
data.frame(
ID = 1:1000
, Age_Yrs_Sex = paste(sample(1:10, 1000, TRUE)
, 0:1
, sep = "_")
)
x <-
data.frame(
Age_Yrs_Sex = paste(c(1,1:4), 0, sep = "_")
)
Count the number of matches for each thing (can be skipped if it is always 1)
matches <-
table(x$Age_Yrs_Sex)
Filter the table to just the matches, then select from each group the number of matches found in the table above (using slice
, randomly sample row numbers from 1 to the number of rows, returning the number of results of that match from the table).
y %>%
filter(Age_Yrs_Sex %in% names(matches)) %>%
group_by(Age_Yrs_Sex) %>%
slice(sample(1:n(), matches[as.character(Age_Yrs_Sex[1])]))
Gives (for example):
ID Age_Yrs_Sex
<int> <fctr>
1 95 1_0
2 777 1_0
3 151 2_0
4 951 3_0
5 403 4_0
Upvotes: 1