Overwrite values from selected columns and matching rows from one data frame into another, R

Question

I have 2 data frames, df1 and df2. Column names are the same in both data frames and column 'field_name' is a unique identifier (each row has a unique field name). df2 doesn't have as many rows as df1.

Some values in df2 differ from those in df1, and I want to import these values to df1 so that the original values in df1 are overwritten with those from df2.

The 'different' values that I want to pull into df1 will always be found in columns of df2 which contain the string '4' (definitely a string, has text before the number) or the string 'tgp'. For these columns, I want to replace the values in df1 with those from df2. However, this won't apply to every row: df2 is smaller than df1 so I need to match on rows so that the values are replaced only in rows of df1 which are also present in df2.

Code here shows what I'm after:

df1 <- data.frame(list(field_name = c("a", "b", "c", "d"),  A3=c("23", "35", "2", "5"), A4=c("94", "120", "67", "5"), B3=c("35", "64", "87", "70"), B4=c("99", "76", "22", "83"), tgp=c("1000", "1542", "758", "964")))

df2 <- data.frame(list(field_name = c("a", "b", "d"),       A3=c("13", "28", "10"),      A4=c("56", "11", "16"),       B3=c("42", "58", "91"),       B4=c("13", "64", "12"),       tgp=c("1154", "1200", "900")))

df_result <- data.frame(list(field_name = c("a", "b", "c", "d"), A3=c("23", "35", "2", "5"), A4=c("56", "11", "67", "16"), B3=c("35", "64", "87", "70"), B4=c("13", "64", "22", "12"), tgp=c("1154", "1200", "758", "900")))

In this example, I've taken values from df2 for field names "a", "b" and "d", from columns 'A4', 'B4' and 'tgp', and overwritten the values in the same locations in df1. For this small example I could do it by hand with indexing but I have many columns containing the required strings, and many rows.

I got this far (with help!) but it doesn't work because of the unequal number of rows in the two data frames:

columns <- which(grepl(colnames(df1), pattern = "4|tgp"))
rows <- which(apply(df1, 1, function(x) which(x %in% intersect(df1$field_name, df2$field_name)))==T)
 

new_df <- df1
for(col in columns){
  for (r in rows){
  new_df[r,col] <- df2[r,col]
  }
}

I'd love a vectorized solution if possible. I've tried playing with 'mutate_at' and 'replace' but so far haven't managed to get anything that works. The following...

df1 %>%
  left_join(df2, by = "field_name")

...from this post looked hopeful but I haven't yet managed to make the 'transmute' work.

Thank you very much.

Gregor Thomas · Accepted Answer

The experimental function rows_update (introduced in dplyr version 1.0) does what you want nicely:

rows_update(
  df1,
  ## use only the columns from df2 that you want to update
  ## plus the joining column
  select(df2, field_name, tgp, ends_with("4")),
  by = "field_name"
)
#   field_name A3 A4 B3 B4  tgp
# 1          a 23 56 35 13 1154
# 2          b 35 11 64 64 1200
# 3          c  2 67 87 22  758
# 4          d  5 16 70 12  900

See ?rows_update for details. There is also rows_insert which adds new rows, rows_upsert which adds new rows and updates existing rows, and a couple other options.

Overwrite values from selected columns and matching rows from one data frame into another, R

Answers (1)

Related Questions