How to join 2 dataframes, fill missing row values in R?

Question

I have two dataframes:

df1

  ua ub a b c
1 11 12 1 1 0
2 12 13 2 2 2
3 13 14 1 1 1
4 14 15 1 1 1
5 15 16 1 1 1
6 16 17 1 1 1

df2

  ua ub a b c d e f
1 11 12 1 1 0 1 1 1
2 14 15 1 1 1 1 1 1
3 16 17 1 1 1 1 1 1
4 12 13 2 2 2 2 2 2

I want to join the two dataframes on ua and ub, and add on the columns in df2 not in df1 onto df1, and "fill in" the missing rows with a 0.

This would give:

  ua ub a b c d e f
1 11 12 1 1 0 1 1 1
2 12 13 2 2 2 2 2 2
3 13 14 1 1 1 0 0 0
4 14 15 1 1 1 1 1 1
5 15 16 1 1 1 0 0 0
6 16 17 1 1 1 1 1 1

I can get the column names in df2 not in df1 by doing:

setdiff(names(df2), names(df1))

But I'm a bit stuck on how to then merge / concat the remaining results.

I looked at this q here but no luck making it work for me.

In Pandas Python, I could use merge combined with .ffill() but I'm unsure how to translate this into R. How can I do this?

Thanks for your help.

AntoniosK · Accepted Answer

One possible solution is to use all column names that df1 and df2 have in common:

df1 = read.table(text = "
ua ub a b c
1 11 12 1 1 0
2 12 13 2 2 2
3 13 14 1 1 1
4 14 15 1 1 1
5 15 16 1 1 1
6 16 17 1 1 1
", header=T)

df2 = read.table(text = "
  ua ub a b c d e f
1 11 12 1 1 0 1 1 1
2 14 15 1 1 1 1 1 1
3 16 17 1 1 1 1 1 1
4 12 13 2 2 2 2 2 2
", header=T)

library(dplyr)

df1 %>%
  left_join(df2, by=c("ua", "ub", "a", "b", "c")) %>%
  mutate_all(function(x) ifelse(is.na(x), 0, x))

#   ua ub a b c d e f
# 1 11 12 1 1 0 1 1 1
# 2 12 13 2 2 2 2 2 2
# 3 13 14 1 1 1 0 0 0
# 4 14 15 1 1 1 1 1 1
# 5 15 16 1 1 1 0 0 0
# 6 16 17 1 1 1 1 1 1

Another way to do the above without manually specifying the common columns names is this

common_names = intersect(names(df1), names(df2))

df1 %>%
  left_join(df2, by=common_names) %>%
  mutate_all(function(x) ifelse(is.na(x), 0, x))

Otherwise, you can remove any columns from df2 that you don't want to join and have them twice in your final dataset and use by=c("ua", "ub") when you join:

names_to_use = c("ua", "ub", setdiff(names(df2), names(df1)))
df2_upd = df2[,names_to_use]

df1 %>%
  left_join(df2_upd, by=c("ua", "ub")) %>%
  mutate_all(function(x) ifelse(is.na(x), 0, x))

How to join 2 dataframes, fill missing row values in R?

Answers (2)

Related Questions