Reputation: 2306
I have a relative large number of years in each data frame, with different country names in each of them. In my reproducible example, df2
contains country d
, which is not present in df1
. I could achieve my goal, shown by df3
, using several lines of code. df3
should be the sum of both df1
and df2
, conditionally to country name and year. I am sure there is an easier way, but I cannot find a solution by myself. Your help is very welcome and I thank you in advance.
df1 <- data.frame(country = c("a", "b", "c"), year1 = c(1, 2, 3), year2 = c(1, 2, 3))
df2 <- data.frame(country = c("a", "b", "d"), year1 = c(1, 2, 3), year2 = c(1, 2, 3))
df3 <- merge(df1, df2, by = "country", all = TRUE) %>%
replace_na(list(
year1.x = 0, year1.y = 0,
year2.x = 0, year2.y = 0)) %>%
mutate(
year1 = year1.x + year1.y,
year2 = year2.x + year2.y) %>%
select(-c(
year1.x, year1.y,
year2.x, year2.y))
This gives my expected result, but I would need a lot of manual typing to achieve it for a large number of years.
df3
generated with this code:
country year1 year2
1 a 2 2
2 b 4 4
3 c 3 3
4 d 3 3
Upvotes: 1
Views: 1604
Reputation: 3269
One way would be:
library(dplyr)
bind_rows(df1, df2) %>%
#mutate_if(is.numeric, tidyr::replace_na, 0) %>% #in case of having NAs
group_by(country) %>%
summarise_all(., sum, na.rm = TRUE)
# # A tibble: 4 x 3
# country year1 year2
# <chr> <dbl> <dbl>
# 1 a 2 2
# 2 b 4 4
# 3 c 3 3
# 4 d 3 3
or a base r solution
aggregate(. ~ country, rbind(df1, df2), sum, na.rm = TRUE, na.action = NULL)
which would generate the same output.
Upvotes: 2
Reputation: 2584
A very simple base
solution:
df3 <- merge.data.frame(df1, df2, by = "country",all = TRUE,suffixes=c("","")
df3[is.na(df3)] <- 0
df3 <- cbind(country=df3$country,df3[,2:3]+df3[,4:5])
country year1 year2
1 a 2 2
2 b 4 4
3 c 3 3
4 d 3 3
Upvotes: 1
Reputation: 8880
data.table
rbindlist(list(df1, df2))[, lapply(.SD, sum, na.rm =T), by = country]
country year1 year2
1: a 2 2
2: b 4 4
3: c 3 3
4: d 3 3
Upvotes: 3