KNOCK
KNOCK

Reputation: 33

R: How can i merge more 2 data frames with adding values?

I got two data frames like this:

dat1
  col   n
1  A    1
2  B    1
3  C    2


dat2
  col   n
1  A    2
2  B    1
3  C    1
4  D    1

and I want to make a data frame like this with dat1 and dat2:

dat3
  col   n
1  A    3
2  B    2
3  C    3
4  D    1

I'm trying to make data frame (dat3) with dplyr bind_rows, group_by and count, but I can't.

bind_rows(dat1, dat2) %>%
  group_by(col)

result:
  col   n 
1  A    1
2  B    1
3  C    2
4  A    2
5  B    1
6  C    1
7  D    1

bind_rows(dat1, dat2) %>%
  group_by(col) %>%
  count(n)

result:
  col   n   nn
1  A    1    1
2  A    2    1
3  B    1    2
4  C    1    1
5  C    2    1
6  D    1    1

How can I make dat3?

Upvotes: 2

Views: 131

Answers (4)

akrun
akrun

Reputation: 886978

Or in base R,

aggregate(cbind(Sum = n) ~ col, rbind(df1, df2), FUN = sum)
#   col Sum
#1   A   3
#2   B   2
#3   C   3
#4   D   1

data

df1 <- structure(list(col = c("A", "B", "C"), n = c(1L, 1L, 2L)), 
    class = "data.frame", row.names = c("1", 
"2", "3"))

df2 <- structure(list(col = c("A", "B", "C", "D"), n = c(2L, 1L, 1L, 
1L)), class = "data.frame", row.names = c("1", "2", "3", "4"))

Upvotes: 1

r2evans
r2evans

Reputation: 160417

Third option, just in case:

psum <- function(..., na.rm = TRUE) {
  m <- cbind(...)
  apply(m, 1, sum, na.rm = na.rm)
}

full_join(dat1, dat2, by = "col") %>%
  mutate(n = psum(n.x, n.y))
#   col n.x n.y n
# 1   A   1   2 3
# 2   B   1   1 2
# 3   C   2   1 3
# 4   D  NA   1 1

(n.x and n.y columns are generated by the join due to same-named columns, they are retained here solely for demonstration. Yes, psum is a hack here, likely something better out there ...)

Upvotes: 1

alexwhitworth
alexwhitworth

Reputation: 4907

data.table is a superior package to dplyr. I suggest you try it:

library(data.table)
dat1 <- setDT(dat1); dat2 <- setDT(dat2)

dat3 <- rbindlist(list(dat1, dat2))[, .(n= sum(n)), .(col)]

Upvotes: 0

dc37
dc37

Reputation: 16178

You should summarise instead of counting:

bind_rows(dat1, dat2) %>%
  group_by(col) %>% summarise(Sum = sum(n))

# A tibble: 4 x 2
  col     Sum
  <chr> <dbl>
1 A         3
2 B         2
3 C         3
4 D         1

Upvotes: 1

Related Questions