user7477092
user7477092

Reputation:

combine two data frames and aggregate

I am having 2 data frames in the below format:

dt1

id     col1    col2    col3    col4 
___    ____    ____    _____   _____
 1      2       3       1       2
 2      3       4       1       1
 3      1       1       1       1
 4      1       2       1       2
 5      1       1       1       1
 6      1       2       1       2

dt2 

id     col1    col2    col3    col4 
___    ____    ____    _____   _____
 1      1       3       1       2
 2      3       4       1       0
 4      1       1       1       1
 6      1       2       1       2
 9      2       1       1       1
12      1       2       1       2

and I want to aggregate and combine these two data frames by the id and the resulting dataframe like

dt3

 id     col1    col2    col3    col4 
    ___    ____    ____    _____   _____
     1      3       6       2       4
     2      6       8       2       1
     3      1       1       1       1
     4      2       3       2       3
     5      1       1       1       1
     6      2       4       2       4
     9      2       1       1       1
    12      1       2       1       2

I tried with dt3=merge(dt1,dt2,all=TRUE) but did not work.Also tried dt3=merge(dt1,dt2,by=id) too did not work.Any help is appreciated.

Upvotes: 2

Views: 3791

Answers (4)

johannes
johannes

Reputation: 14433

Here is a dplyr solution:

library(dplyr)
bind_rows(dt1, dt2) %>% group_by(id) %>% 
  summarise_all(sum)

Data

dt1  <- structure(
  list(id = 1:6, col1 = c(2L, 3L, 1L, 1L, 1L, 1L), 
       col2 = c(3L, 4L, 1L, 2L, 1L, 2L), 
       col3 = c(1L, 1L, 1L, 1L, 1L, 1L), 
       col4 = c(2L, 1L, 1L, 2L, 1L, 2L)), 
  .Names = c("id", "col1", "col2", "col3",  "col4"), 
  class = "data.frame", row.names = c(NA, -6L))


dt2 <- structure(
  list(id = c(1L, 2L, 4L, 6L, 9L, 12L), 
       col1 = c(1L, 3L, 1L, 1L, 2L, 1L), 
       col2 = c(3L, 4L, 1L, 2L, 1L, 2L), 
       col3 = c(1L, 1L, 1L, 1L, 1L, 1L), 
       col4 = c(2L, 0L, 1L, 2L, 1L, 2L)), 
  .Names = c("id", "col1", "col2", "col3", "col4"), 
  class = "data.frame", row.names = c(NA, -6L))

Upvotes: 0

xyz123
xyz123

Reputation: 651

Since they have the same format and the columns match up put them in row by row.

dt3 <- data.frame(dt1)

dt3 <- rbind(dt2) # rbind lines up your observations row by row.

You could probably put that all in one line

dt3 <- data.frame(rbind(dt1, dt2))

Upvotes: 0

akrun
akrun

Reputation: 887098

We can use rbindlist in data.table and get the sum of each column after grouping by 'id'

library(data.table)
rbindlist(mget(paste0('dt', 1:2)))[, lapply(.SD, sum), by = id]
#    id col1 col2 col3 col4
#1:  1    3    6    2    4
#2:  2    6    8    2    1
#3:  3    1    1    1    1
#4:  4    2    3    2    3
#5:  5    1    1    1    1
#6:  6    2    4    2    4
#7:  9    2    1    1    1
#8: 12    1    2    1    2

Or using bind_rows with group_by and summarise_each from tidyverse

librarydplyr)
bind_rows(dt1, dt2) %>%
          group_by(id) %>%
          summarise_each(funs(sum))

Upvotes: 1

lebelinoz
lebelinoz

Reputation: 5068

The magic word you're looking for is rbind: dt3 = rbind(dt1, dt2)

Upvotes: 0

Related Questions