Reputation: 1532
I have three data frames. One 'main' data frame to which I want to merge the other two sub data frames.
main = data.frame(a = LETTERS[1:5], b = round(rnorm(5),2))
a b
A 0.41
B -1.06
C -0.65
D -1.27
E 1.03
sub1 = data.frame(a = LETTERS[2:4], c = sample(c(100, 200, 300), 3, replace=T))
a c
B 200
C 200
D 200
sub2 = data.frame(a = LETTERS[c(1,5)], c = sample(c(999, 888), 2, replace=T))
a c
A 999
E 888
In the first merge, I want to merge sub1
to main
via "a"
and this works.
merged1 = merge(main, sub1, by="a", all.x=T)
a b c
A 0.41 NA
B -1.06 300
C -0.65 300
D -1.27 100
E 1.03 NA
Now I want to join/merge the sub2
to merged1
. In my case, the values provided by sub2
substitute the NAs in column c. Specifically I want to replace these NAs with the values from sub2
. I tried this (and other arguments for all.x
, all.y
:
merge(merged1, sub2, by="a", all.x=T)
a b c.x c.y
A 0.41 NA 999
B -1.06 300 NA
C -0.65 300 NA
D -1.27 100 NA
E 1.03 NA 999
How can I get this to be only one column c with the merge?
Upvotes: 0
Views: 219
Reputation: 225
Note: My seed was set, so the results are different than those in the question.
If you bind your sub1 or sub2 together, you can do it all in one merge.
set.seed(42)
main = data.frame(a = LETTERS[1:5], b = round(rnorm(5),2))
sub1 = data.frame(a = LETTERS[2:4], c = sample(c(100, 200, 300), 3, replace=T))
sub2 = data.frame(a = LETTERS[c(1,5)], c = sample(c(999, 888), 2, replace=T))
#Create one data frame
all_sub <- rbind(sub1,sub2)
merged <- merge(main, all_sub, by="a", all.x=T)
The output is as expected:
> merged
a b c
1 A 1.37 999
2 B -0.56 200
3 C 0.36 300
4 D 0.63 300
5 E 0.40 999
Upvotes: 4