Christian
Christian

Reputation: 477

combine two data frames or tibbles

I have the following data frames:

a <- data.frame(Test=1:4,
            TestA=5:6)
> a
    Test TestA
1    1     5
2    2     6
3    3     5
4    4     6


b <- data.frame(TEST=1:10,
            TestB=11:20)

> b
    TEST TestB
1     1    11
2     2    12
3     3    13
4     4    14
5     5    15
6     6    16
7     7    17
8     8    18
9     9    19
10   10    20

And I want to combine them such that the result looks like:

    Test TestA TEST   TestB
1     1     5    1      11
2     2     6    2      12
3     3     5    3      13
4     4     6    4      14
5     0     0    5      15
6     0     0    6      16
7     0     0    7      17
8     0     0    8      18
9     0     0    9      19
10    0     0   10      20

That is, to combine with unmatched rows filled with zeros.

I am convinced that a simple solution exists and it would be fine if there is a way with dplyr.

Upvotes: 3

Views: 22900

Answers (6)

moodymudskipper
moodymudskipper

Reputation: 47310

You could use package sqldf :

library(sqldf)
res <- sqldf("SELECT a.*, b.* FROM b LEFT JOIN a on a.test = B.test")
res[is.na(res)] <- 0
res
#    Test TestA TEST TestB
# 1     1     5    1    11
# 2     2     6    2    12
# 3     3     5    3    13
# 4     4     6    4    14
# 5     0     0    5    15
# 6     0     0    6    16
# 7     0     0    7    17
# 8     0     0    8    18
# 9     0     0    9    19
# 10    0     0   10    20

Or using only SQL, leveraging the function coalesce as @G. Grothendieck mentions:

sqldf("SELECT coalesce(a.Test, 0) Test, coalesce(a.TestA, 0) TestA, b.* FROM b LEFT JOIN a using(test)")

Upvotes: 0

Frank
Frank

Reputation: 66819

With data.table, you could

  1. pre-fill 0s in b and then
  2. update-join values in from a

like...

# input
a <- data.frame(Test=1:4, TestA=5:6)
b <- data.frame(TEST=1:10, TestB=11:20)
library(data.table)
setDT(a); setDT(b)

# prefill
b[, c("Test", "TestA") := 0L]

# update join
b[a, on=.(TEST = Test), c("Test", "TestA") := .(i.Test, i.TestA)]

    TEST TestB Test TestA
 1:    1    11    1     5
 2:    2    12    2     6
 3:    3    13    3     5
 4:    4    14    4     6
 5:    5    15    0     0
 6:    6    16    0     0
 7:    7    17    0     0
 8:    8    18    0     0
 9:    9    19    0     0
10:   10    20    0     0

This modifies b instead of creating a new table. This works for your example, but if you need a "full join" (where b doesn't have the full set of rows you want in the final table), another answer will fit better.

On the other hand, if your table has real NAs that you don't want filled with zeros, this is a better approach than the first few answers (which overwrite all NAs, not just those resulting from a row being unmatched in the tables' join/merge/combination).


To generalize to more columns, define a list of default values...

# input
a <- data.frame(Test=1:4, TestA=5:6)
b <- data.frame(TEST=1:10, TestB=11:20)

library(data.table)
setDT(a); setDT(b)    
defaults = list(Test = 0L, TestA = 0L)
new_cols = names(defaults)

# prefill defaults
b[, (new_cols) := defaults]

# update join
b[a, on=.(TEST = Test), (new_cols) := mget(sprintf("i.%s", new_cols))]

Upvotes: 1

AntoniosK
AntoniosK

Reputation: 16121

# example datasets
a <- data.frame(Test=1:4,
                TestA=5:6)

b <- data.frame(TEST=1:10,
                TestB=11:20)

library(dplyr)

a %>%
  mutate(TEST = Test) %>%                        # duplicate Test column and give the name TEST
  full_join(b, by="TEST") %>%                    # full join 
  mutate_at(vars(Test, TestA), ~coalesce(.,0L))  # replace NAs with 0s for those two variables

#    Test TestA TEST TestB
# 1     1     5    1    11
# 2     2     6    2    12
# 3     3     5    3    13
# 4     4     6    4    14
# 5     0     0    5    15
# 6     0     0    6    16
# 7     0     0    7    17
# 8     0     0    8    18
# 9     0     0    9    19
# 10    0     0   10    20

You can also use mutate_all(~coalesce(.,0L)), but no need to look all columns for NAs if you know that they will exist only on those two columns.

Upvotes: 0

Mason Malone
Mason Malone

Reputation: 166

You can combine two data frames using merge().

df<-merge(x=a,y=b,by.x="Test",by.y = "TEST",all= T)

The above produces:

   Test TestA TestB
1     1     5    11
2     2     6    12
3     3     5    13
4     4     6    14
5     5    NA    15
6     6    NA    16
7     7    NA    17
8     8    NA    18
9     9    NA    19
10   10    NA    20

If you want Test and TEST separate, you can create an ID column for both, and replace by.x and by.y with that ID variable.

To replace NAs with 0s, you can use df$TestA[is.na(TestA)]<-0. Same for Test if you want to keep both Test and TEST.

Upvotes: 3

A. Suliman
A. Suliman

Reputation: 13125

df <- merge(a, b, by = 0, all = TRUE,sort = FALSE)[-1]
df[is.na(df)] <- 0
df
   Test TestA TEST TestB
1     1     5    1    11
2     2     6    2    12
3     3     5    3    13
4     4     6    4    14
5     0     0    5    15
6     0     0    6    16
7     0     0    7    17
8     0     0    8    18
9     0     0    9    19
10    0     0   10    20

Upvotes: 0

Luke Hayden
Luke Hayden

Reputation: 712

Somewhat unusual to want the Test column repeated and to want 0s instead of NAs, but exactly as you ask would be:

library(dplyr)
b$Test <- b$TEST
c <- full_join(a,b, by="Test")
c$Test[is.na(c$TestA)] <-0
c$TestA[is.na(c$TestA)] <-0

Upvotes: 0

Related Questions