Reputation: 477
I have the following data frames:
a <- data.frame(Test=1:4,
TestA=5:6)
> a
Test TestA
1 1 5
2 2 6
3 3 5
4 4 6
b <- data.frame(TEST=1:10,
TestB=11:20)
> b
TEST TestB
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
And I want to combine them such that the result looks like:
Test TestA TEST TestB
1 1 5 1 11
2 2 6 2 12
3 3 5 3 13
4 4 6 4 14
5 0 0 5 15
6 0 0 6 16
7 0 0 7 17
8 0 0 8 18
9 0 0 9 19
10 0 0 10 20
That is, to combine with unmatched rows filled with zeros.
I am convinced that a simple solution exists and it would be fine if there is a way with dplyr
.
Upvotes: 3
Views: 22900
Reputation: 47310
You could use package sqldf
:
library(sqldf)
res <- sqldf("SELECT a.*, b.* FROM b LEFT JOIN a on a.test = B.test")
res[is.na(res)] <- 0
res
# Test TestA TEST TestB
# 1 1 5 1 11
# 2 2 6 2 12
# 3 3 5 3 13
# 4 4 6 4 14
# 5 0 0 5 15
# 6 0 0 6 16
# 7 0 0 7 17
# 8 0 0 8 18
# 9 0 0 9 19
# 10 0 0 10 20
Or using only SQL
, leveraging the function coalesce
as @G. Grothendieck mentions:
sqldf("SELECT coalesce(a.Test, 0) Test, coalesce(a.TestA, 0) TestA, b.* FROM b LEFT JOIN a using(test)")
Upvotes: 0
Reputation: 66819
With data.table, you could
b
and then a
like...
# input
a <- data.frame(Test=1:4, TestA=5:6)
b <- data.frame(TEST=1:10, TestB=11:20)
library(data.table)
setDT(a); setDT(b)
# prefill
b[, c("Test", "TestA") := 0L]
# update join
b[a, on=.(TEST = Test), c("Test", "TestA") := .(i.Test, i.TestA)]
TEST TestB Test TestA
1: 1 11 1 5
2: 2 12 2 6
3: 3 13 3 5
4: 4 14 4 6
5: 5 15 0 0
6: 6 16 0 0
7: 7 17 0 0
8: 8 18 0 0
9: 9 19 0 0
10: 10 20 0 0
This modifies b
instead of creating a new table. This works for your example, but if you need a "full join" (where b
doesn't have the full set of rows you want in the final table), another answer will fit better.
On the other hand, if your table has real NAs that you don't want filled with zeros, this is a better approach than the first few answers (which overwrite all NAs, not just those resulting from a row being unmatched in the tables' join/merge/combination).
To generalize to more columns, define a list of default
values...
# input
a <- data.frame(Test=1:4, TestA=5:6)
b <- data.frame(TEST=1:10, TestB=11:20)
library(data.table)
setDT(a); setDT(b)
defaults = list(Test = 0L, TestA = 0L)
new_cols = names(defaults)
# prefill defaults
b[, (new_cols) := defaults]
# update join
b[a, on=.(TEST = Test), (new_cols) := mget(sprintf("i.%s", new_cols))]
Upvotes: 1
Reputation: 16121
# example datasets
a <- data.frame(Test=1:4,
TestA=5:6)
b <- data.frame(TEST=1:10,
TestB=11:20)
library(dplyr)
a %>%
mutate(TEST = Test) %>% # duplicate Test column and give the name TEST
full_join(b, by="TEST") %>% # full join
mutate_at(vars(Test, TestA), ~coalesce(.,0L)) # replace NAs with 0s for those two variables
# Test TestA TEST TestB
# 1 1 5 1 11
# 2 2 6 2 12
# 3 3 5 3 13
# 4 4 6 4 14
# 5 0 0 5 15
# 6 0 0 6 16
# 7 0 0 7 17
# 8 0 0 8 18
# 9 0 0 9 19
# 10 0 0 10 20
You can also use mutate_all(~coalesce(.,0L))
, but no need to look all columns for NAs if you know that they will exist only on those two columns.
Upvotes: 0
Reputation: 166
You can combine two data frames using merge()
.
df<-merge(x=a,y=b,by.x="Test",by.y = "TEST",all= T)
The above produces:
Test TestA TestB
1 1 5 11
2 2 6 12
3 3 5 13
4 4 6 14
5 5 NA 15
6 6 NA 16
7 7 NA 17
8 8 NA 18
9 9 NA 19
10 10 NA 20
If you want Test
and TEST
separate, you can create an ID column for both, and replace by.x
and by.y
with that ID variable.
To replace NAs with 0s, you can use df$TestA[is.na(TestA)]<-0
. Same for Test
if you want to keep both Test
and TEST
.
Upvotes: 3
Reputation: 13125
df <- merge(a, b, by = 0, all = TRUE,sort = FALSE)[-1]
df[is.na(df)] <- 0
df
Test TestA TEST TestB
1 1 5 1 11
2 2 6 2 12
3 3 5 3 13
4 4 6 4 14
5 0 0 5 15
6 0 0 6 16
7 0 0 7 17
8 0 0 8 18
9 0 0 9 19
10 0 0 10 20
Upvotes: 0
Reputation: 712
Somewhat unusual to want the Test column repeated and to want 0s instead of NAs, but exactly as you ask would be:
library(dplyr)
b$Test <- b$TEST
c <- full_join(a,b, by="Test")
c$Test[is.na(c$TestA)] <-0
c$TestA[is.na(c$TestA)] <-0
Upvotes: 0