Reputation: 3852
I have two dataframes:
df1
ua ub a b c
1 11 12 1 1 0
2 12 13 2 2 2
3 13 14 1 1 1
4 14 15 1 1 1
5 15 16 1 1 1
6 16 17 1 1 1
df2
ua ub a b c d e f
1 11 12 1 1 0 1 1 1
2 14 15 1 1 1 1 1 1
3 16 17 1 1 1 1 1 1
4 12 13 2 2 2 2 2 2
I want to join the two dataframes on ua
and ub
, and add on the columns in df2
not in df1
onto df1
, and "fill in" the missing rows with a 0
.
This would give:
ua ub a b c d e f
1 11 12 1 1 0 1 1 1
2 12 13 2 2 2 2 2 2
3 13 14 1 1 1 0 0 0
4 14 15 1 1 1 1 1 1
5 15 16 1 1 1 0 0 0
6 16 17 1 1 1 1 1 1
I can get the column names in df2
not in df1
by doing:
setdiff(names(df2), names(df1))
But I'm a bit stuck on how to then merge / concat the remaining results.
I looked at this q here but no luck making it work for me.
In Pandas Python, I could use merge
combined with .ffill()
but I'm unsure how to translate this into R. How can I do this?
Thanks for your help.
Upvotes: 2
Views: 947
Reputation: 16121
One possible solution is to use all column names that df1
and df2
have in common:
df1 = read.table(text = "
ua ub a b c
1 11 12 1 1 0
2 12 13 2 2 2
3 13 14 1 1 1
4 14 15 1 1 1
5 15 16 1 1 1
6 16 17 1 1 1
", header=T)
df2 = read.table(text = "
ua ub a b c d e f
1 11 12 1 1 0 1 1 1
2 14 15 1 1 1 1 1 1
3 16 17 1 1 1 1 1 1
4 12 13 2 2 2 2 2 2
", header=T)
library(dplyr)
df1 %>%
left_join(df2, by=c("ua", "ub", "a", "b", "c")) %>%
mutate_all(function(x) ifelse(is.na(x), 0, x))
# ua ub a b c d e f
# 1 11 12 1 1 0 1 1 1
# 2 12 13 2 2 2 2 2 2
# 3 13 14 1 1 1 0 0 0
# 4 14 15 1 1 1 1 1 1
# 5 15 16 1 1 1 0 0 0
# 6 16 17 1 1 1 1 1 1
Another way to do the above without manually specifying the common columns names is this
common_names = intersect(names(df1), names(df2))
df1 %>%
left_join(df2, by=common_names) %>%
mutate_all(function(x) ifelse(is.na(x), 0, x))
Otherwise, you can remove any columns from df2
that you don't want to join and have them twice in your final dataset and use by=c("ua", "ub")
when you join:
names_to_use = c("ua", "ub", setdiff(names(df2), names(df1)))
df2_upd = df2[,names_to_use]
df1 %>%
left_join(df2_upd, by=c("ua", "ub")) %>%
mutate_all(function(x) ifelse(is.na(x), 0, x))
Upvotes: 1
Reputation: 1198
Assuming you want to retain all cases of df1
and not df2
. Below could be used
library(dplyr)
df3 <- left_join(df1, df2)
df3[is.na(df3)] <- 0
if all cases of df1
as well as df2
is required then replace left_join
with full join
in above code
Upvotes: 2