user2300940
user2300940

Reputation: 2385

Merge without replacement to get unique rows only

I would like to select n rows from y that matches the strings in x where n= length of x, but the same row in y should not be selected more than one time. The rows should be selected randomly from y.

> head(x$Age_Yrs_Sex)
[1] "65_0" "72_1" "82_0" "52_0" "81_0" "58_0"


> head(y,20)
               ID Age_Yrs_Sex
1  10678800017        30_0
2  106788000024        63_0
4  10678800048        59_0
5  1067880000055        68_1
7  1067800079        59_0
8  10678800086        36_1
10 10678000109        39_0
12 1067880123        42_0
13 10678800130        45_1
14 106788000147        49_1
15 1067880000154        24_0
16 106780000161        44_0
17 1067880000178        43_1
19 106780000192        79_0
20 106880000208        22_0
22 107880000222        89_0
23 167880000239        28_0
24 106788000246        44_1
25 106780000253        76_0
26 106780000260        45_1

Upvotes: 1

Views: 67

Answers (1)

Mark Peterson
Mark Peterson

Reputation: 9570

Assuming that the entries in x are always less than those in y for a given match, this should work (using dplyr). Generating usable example data here:

y <-
  data.frame(
    ID = 1:1000
    , Age_Yrs_Sex = paste(sample(1:10, 1000, TRUE)
                        , 0:1
                        , sep = "_")
  )

x <-
  data.frame(
    Age_Yrs_Sex = paste(c(1,1:4), 0, sep = "_")
  )

Count the number of matches for each thing (can be skipped if it is always 1)

matches <-
  table(x$Age_Yrs_Sex)

Filter the table to just the matches, then select from each group the number of matches found in the table above (using slice, randomly sample row numbers from 1 to the number of rows, returning the number of results of that match from the table).

y %>%
  filter(Age_Yrs_Sex %in% names(matches)) %>%
  group_by(Age_Yrs_Sex) %>%
  slice(sample(1:n(), matches[as.character(Age_Yrs_Sex[1])]))

Gives (for example):

     ID Age_Yrs_Sex
  <int>      <fctr>
1    95         1_0
2   777         1_0
3   151         2_0
4   951         3_0
5   403         4_0

Upvotes: 1

Related Questions