user121
user121

Reputation: 931

Matching multiple tables (dataframes) using ID

I have the following 3 tables:

ID H2
3254D YUB
1011A WSX
1089 WER
1043 RFR
ID Q2
1021A QAZ
1089 EDR
3254D DFH
1043 RFV
ID X2
1011A WSX
1569V PKJ
1089 UIO
3254D BHY

As a newbie to R,I am looking for a way to find the intersecting columns in these 3 tables using the ID column. The desired output I am looking at are the following 3 tables:

ID H2
1089 WER
3254D YUB
ID Q2
1089 EDR
3254D DFH
ID X2
1089 UIO
3254D BHY

Please also note, that I also have a situation where it can be more than 3 tables, i.e., the solution should be able to generalize to more than 3 tables. Appreciate any suggestions. Cheers.

Upvotes: 1

Views: 396

Answers (1)

akrun
akrun

Reputation: 887068

We can use intersect to get the intersecting 'ID' from all the datasets . Then, use that to subset the individual datasets

ids <- Reduce(intersect, list(df1$ID, df2$ID, df3$ID))
subdf1 <- subset(df1, ID %in% ids)
subdf2 <- subset(df2, ID %in% ids)
subdf3 <- subset(df3, ID %in% ids)

-output

subdf1
#     ID  H2
#1 3254D YUB
#3  1089 WER
subdf2
#     ID  Q2
#2  1089 EDR
#3 3254D DFH

subdf3
#     ID  X2
#3  1089 UIO
#4 3254D BHY

If there are many datasets, and the object names follow a pattern i.e. 'df' followed by some digits, then use mget to get them into a list

lst1 <- mget(ls(pattern = '^df\\d+$'))

Get the common intersecting elements after extracting the 'ID' column

ids <- Reduce(intersect, lapply(lst1, `[[`, "ID"))

and subset the list elements and keep it in a list

lst2 <- lapply(lst1, subset, subset = ID %in% ids)

-output

lst2
#$df1
#     ID  H2
#1 3254D YUB
#3  1089 WER

#$df2
#     ID  Q2
#2  1089 EDR
#3 3254D DFH

#$df3
#     ID  X2
#3  1089 UIO
#4 3254D BHY

data

df1 <- structure(list(ID = c("3254D", "1011A", "1089", "1043"), H2 = c("YUB", 
"WSX", "WER", "RFR")), class = "data.frame", row.names = c(NA, 
-4L))

df2 <- structure(list(ID = c("1021A", "1089", "3254D", "1043"), Q2 = c("QAZ", 
"EDR", "DFH", "RFV")), class = "data.frame", row.names = c(NA, 
-4L))

df3 <- structure(list(ID = c("1011A", "1569V", "1089", "3254D"), X2 = c("WSX", 
"PKJ", "UIO", "BHY")), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 3

Related Questions