bttomio
bttomio

Reputation: 2306

Merge data frames and sum columns with the same name

I have a relative large number of years in each data frame, with different country names in each of them. In my reproducible example, df2 contains country d, which is not present in df1. I could achieve my goal, shown by df3, using several lines of code. df3 should be the sum of both df1 and df2, conditionally to country name and year. I am sure there is an easier way, but I cannot find a solution by myself. Your help is very welcome and I thank you in advance.

df1 <- data.frame(country = c("a", "b", "c"), year1 = c(1, 2, 3), year2 = c(1, 2, 3))
df2 <- data.frame(country = c("a", "b", "d"), year1 = c(1, 2, 3), year2 = c(1, 2, 3))

df3 <- merge(df1, df2, by = "country", all = TRUE) %>%
  replace_na(list(
    year1.x = 0, year1.y = 0,
    year2.x = 0, year2.y = 0)) %>%
  mutate(
    year1 = year1.x + year1.y,
    year2 = year2.x + year2.y) %>%
  select(-c(
    year1.x, year1.y,
    year2.x, year2.y))

This gives my expected result, but I would need a lot of manual typing to achieve it for a large number of years.

df3 generated with this code:

  country year1 year2
1       a     2     2
2       b     4     4
3       c     3     3
4       d     3     3

Upvotes: 1

Views: 1604

Answers (3)

AlexB
AlexB

Reputation: 3269

One way would be:

library(dplyr)

bind_rows(df1, df2) %>%
  #mutate_if(is.numeric, tidyr::replace_na, 0) %>% #in case of having NAs
  group_by(country) %>%
  summarise_all(., sum, na.rm = TRUE)
  

# # A tibble: 4 x 3
# country year1 year2
# <chr>     <dbl>   <dbl>
# 1 a             2       2
# 2 b             4       4
# 3 c             3       3
# 4 d             3       3

or a base r solution

aggregate(. ~ country, rbind(df1, df2), sum, na.rm = TRUE, na.action = NULL)

which would generate the same output.

Upvotes: 2

Elia
Elia

Reputation: 2584

A very simple base solution:

df3 <- merge.data.frame(df1, df2, by = "country",all = TRUE,suffixes=c("","")
df3[is.na(df3)] <- 0
df3 <- cbind(country=df3$country,df3[,2:3]+df3[,4:5])


  country year1 year2
1       a     2     2
2       b     4     4
3       c     3     3
4       d     3     3

Upvotes: 1

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

data.table

rbindlist(list(df1, df2))[, lapply(.SD, sum, na.rm =T), by = country]
   country year1 year2
1:       a     2     2
2:       b     4     4
3:       c     3     3
4:       d     3     3

Upvotes: 3

Related Questions