s.shi
s.shi

Reputation: 87

Merge 2 data frames by row and column overlap

I would like to merge 2 data frames additively such that

      taxonomy A B C
1          rat 0 1 2
2          dog 1 2 3
3          cat 2 3 0

and

      taxonomy A D C
1          rat 0 1 9
2        Horse 0 2 6
3          cat 2 0 2

produce

      taxonomy A B C  D
1          rat 0 1 11 1
2        Horse 0 0 6  2 
3          cat 4 3 2  0
4          dog 1 2 3  0

I've tried aggregate, merge, apply, ddply.... with no success...this will be done on 2 data frames with a couple hundred rows and columns

Upvotes: 4

Views: 130

Answers (3)

markus
markus

Reputation: 26343

The data.table equivalent of @avid_useR's answer.

library(data.table)
rbindlist(list(df1, df2), fill = TRUE)[, lapply(.SD, sum, na.rm = TRUE), by = taxonomy]
#   taxonomy A B  C D
#1:      rat 0 1 11 1
#2:      dog 1 2  3 0
#3:      cat 4 3  2 0
#4:    Horse 0 0  6 2

Upvotes: 2

Frank
Frank

Reputation: 66819

You can do...

> library(reshape2)
> dcast(rbind(melt(DF1), melt(DF2)), taxonomy ~ variable, fun.aggregate = sum)
Using taxonomy as id variables
Using taxonomy as id variables
  taxonomy A B  C D
1      cat 4 3  2 0
2      dog 1 2  3 0
3    Horse 0 0  6 2
4      rat 0 1 11 1

This sorts the rows and columns alphabetically, but I guess this might be avoidable by using a factor.

Data:

DF1 = structure(list(taxonomy = c("rat", "dog", "cat"), A = 0:2, B = 1:3, 
    C = c(2L, 3L, 0L)), .Names = c("taxonomy", "A", "B", "C"), row.names = c(NA, 
-3L), class = "data.frame")
DF2 = structure(list(taxonomy = c("rat", "Horse", "cat"), A = c(0L, 
0L, 2L), D = c(1L, 2L, 0L), C = c(9L, 6L, 2L)), .Names = c("taxonomy", 
"A", "D", "C"), row.names = c(NA, -3L), class = "data.frame")

Upvotes: 1

acylam
acylam

Reputation: 18661

With bind_rows from dplyr:

library(dplyr)

bind_rows(df1, df2) %>%
  group_by(taxonomy) %>%
  summarize_all(sum, na.rm = TRUE)

Output:

# A tibble: 4 x 5
  taxonomy     A     B     C     D
  <chr>    <int> <int> <int> <int>
1 cat          4     3     2     0
2 dog          1     2     3     0
3 Horse        0     0     6     2
4 rat          0     1    11     1

Data:

df1 <- structure(list(taxonomy = c("rat", "dog", "cat"), A = 0:2, B = 1:3, 
    C = c(2L, 3L, 0L)), .Names = c("taxonomy", "A", "B", "C"), class = "data.frame", row.names = c("1", 
"2", "3"))

df2 <- structure(list(taxonomy = c("rat", "Horse", "cat"), A = c(0L, 
0L, 2L), D = c(1L, 2L, 0L), C = c(9L, 6L, 2L)), .Names = c("taxonomy", 
"A", "D", "C"), class = "data.frame", row.names = c("1", "2", 
"3"))

Upvotes: 4

Related Questions