David Z
David Z

Reputation: 7041

How to merge and sum two data frames

Here is my issue:

df1 <- data.frame(x = 1:5, y = 2:6, z = 3:7)
rownames(df1) <- LETTERS[1:5]
df1
  x y z
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 6
E 5 6 7

df2 <- data.frame(x = 1:5, y = 2:6, z = 3:7)
rownames(df2) <- LETTERS[3:7]
df2
  x y z
C 1 2 3
D 2 3 4
E 3 4 5
F 4 5 6
G 5 6 7

what I wanted is:

  x y z
A 1 2 3
B 2 3 4
C 4 6 8
D 6 8 10
E 8 10 12
F 4 5 6
G 5 6 7

where duplicated rows were added up by same variable.

Upvotes: 17

Views: 22457

Answers (5)

user6376316
user6376316

Reputation:

An alternative is to melt the data and cast it. At first we set the row names to the last column of both data frames thanks to @Jaap

df1$rn <- rownames(df1)
df2$rn <- rownames(df2)

Then we melt the data based on the name

melt(list(df1, df2), id.vars = "rn")

Then we use dcast with mget function which is used to retrieve multiple variables at once.

mydf<- dcast(melt(mget(ls(pattern = "df\\d+")), id.vars = "rn"), 
      rn ~ variable, value.var = "value", fun.aggregate = sum)

rownames(mydf) <- mydf$rn

# get rid of the 'rn' column
mydf <- mydf[, -1]

> mydf
#  x  y  z
#A 1  2  3
#B 2  3  4
#C 4  6  8
#D 6  8 10
#E 8 10 12
#F 4  5  6
#G 5  6  7

Upvotes: 1

Jaap
Jaap

Reputation: 83215

A solution with base R:

# create a new variable from the rownames
df1$rn <- rownames(df1)
df2$rn <- rownames(df2)

# bind the two dataframes together by row and aggregate
res <- aggregate(cbind(x,y,z) ~ rn, rbind(df1,df2), sum)
# or (thx to @alistaire for reminding me):
res <- aggregate(. ~ rn, rbind(df1,df2), sum)

# assign the rownames again
rownames(res) <- res$rn

# get rid of the 'rn' column
res <- res[, -1]

which gives:

> res
  x  y  z
A 1  2  3
B 2  3  4
C 4  6  8
D 6  8 10
E 8 10 12
F 4  5  6
G 5  6  7

Upvotes: 15

MarKo9
MarKo9

Reputation: 97

could also vectorize the operation turning the dfs to matrices:

result_df <- as.data.frame(as.matrix(df1) + as.matrix(df2))

Upvotes: 4

IRTFM
IRTFM

Reputation: 263342

This might need some teaking to get the rownames logic working on a longer example:

dfr <-rbind(df1,df2)
do.call(rbind, lapply( split(dfr, sapply(rownames(dfr),substr,1,1)), colSums))
  x  y  z
A 1  2  3
B 2  3  4
C 4  6  8
D 6  8 10
E 8 10 12
F 4  5  6
G 5  6  7

If the rownames could all be assumed to be alpha characters a gsub solution should be easy.

Upvotes: 2

alistaire
alistaire

Reputation: 43334

With dplyr,

library(dplyr)

# add rownames as a column in each data.frame and bind rows
bind_rows(df1 %>% add_rownames(), 
          df2 %>% add_rownames()) %>% 
    # evaluate following calls for each value in the rowname column
    group_by(rowname) %>% 
    # add all non-grouping variables
    summarise_all(sum)

## # A tibble: 7 x 4
##   rowname     x     y     z
##     <chr> <int> <int> <int>
## 1       A     1     2     3
## 2       B     2     3     4
## 3       C     4     6     8
## 4       D     6     8    10
## 5       E     8    10    12
## 6       F     4     5     6
## 7       G     5     6     7

Upvotes: 12

Related Questions