Reputation: 1548
I can't find similar answer on stack overflow. I have 2 data frames: a:
X match_nr matches
12 ACADIA.3 1 ACADIA Pharmaceuticals Inc.
13 ACADIA.3 2 ACADIA PHARMACEUTICALS STOCK WKN 603035 | ISIN US0042251084
14 Acorda 1 Acorda Therapeutics Inc.
15 Advanced Accelerator.1 1
16 Advanced Accelerator.2 1
b:
X id dataset_code database_code
5 ACADIA.3 10973832 ACHC WIKI
6 Acorda 10974692 ACOR WIKI
7 Advanced Accelerator.1 10973902 AXDX WIKI
How to merge them to have "left.join" by X? I used dplyr::left.join, merge and cbind.fill and in all cases the results data.frame are much bigger than both of them, because X is not unique in a (but is unique in b). All I need is to append missing columns and repeat all values for rows where X=X.
EDIT
I'm not sure I'm clear with my explanations. I can draw what I want to achieve: c:
X matches name dataset_code database_code
ACADIA.3 1 ACADIA Pharmaceuticals 10973832 ACHC
ACADIA.3 2 ACADIA PHARMACEUTICALS 10973832 ACHC
Acorda 1 Acorda Therapeutics 10974692 ACOR
Advanced Accelerator.1 1 10973902 AXDX
Advanced Accelerator.2 1
Is it really hard to understand? My closest code is:
cdf <- NULL
for (i in 1:nrow(a)) {
xdf <- cbind.data.frame(a[i,], b[b$X == a$X[i],])
cdf <- rbind.data.frame(cdf, xdf)
}
but it missing the last row of a (only 4) - should be exactly the same number as a
Upvotes: 1
Views: 3708
Reputation: 263301
The 'name' column seems to be a request for something that is not actually in either of your example datasets but if you want a result that delivers most of what you want then just use merge with all.x=TRUE:
merge(a,b,all.x=TRUE)[-c(3,6)]
X match_nr id dataset_code
1 ACADIA.3 1 10973832 ACHC
2 ACADIA.3 2 10973832 ACHC
3 Acorda 1 10974692 ACOR
4 Advanced Accelerator.1 1 10973902 AXDX
5 Advanced Accelerator.2 1 NA <NA>
And if either of the columns that were dropped is what you were expecting for name
, then remove their number from the exclusion vector.
Upvotes: 1
Reputation: 4993
Try an anti_join which is aimed at finding all unique rows, Use this to pull the a values NOT IN B, then rbind
those onto B, and you should have what you are looking for:
c <- anti_join(a, b, by = "X")
This should pull the unique values from A and then
df <- rbind(b,c)
to reattach the b values which are already unique...filling in the previously missing A
values
Upvotes: 0