Reputation: 35
I have 2 data frames, df1 and df2. Column names are the same in both data frames and column 'field_name' is a unique identifier (each row has a unique field name). df2 doesn't have as many rows as df1.
Some values in df2 differ from those in df1, and I want to import these values to df1 so that the original values in df1 are overwritten with those from df2.
The 'different' values that I want to pull into df1 will always be found in columns of df2 which contain the string '4' (definitely a string, has text before the number) or the string 'tgp'. For these columns, I want to replace the values in df1 with those from df2. However, this won't apply to every row: df2 is smaller than df1 so I need to match on rows so that the values are replaced only in rows of df1 which are also present in df2.
Code here shows what I'm after:
df1 <- data.frame(list(field_name = c("a", "b", "c", "d"), A3=c("23", "35", "2", "5"), A4=c("94", "120", "67", "5"), B3=c("35", "64", "87", "70"), B4=c("99", "76", "22", "83"), tgp=c("1000", "1542", "758", "964")))
df2 <- data.frame(list(field_name = c("a", "b", "d"), A3=c("13", "28", "10"), A4=c("56", "11", "16"), B3=c("42", "58", "91"), B4=c("13", "64", "12"), tgp=c("1154", "1200", "900")))
df_result <- data.frame(list(field_name = c("a", "b", "c", "d"), A3=c("23", "35", "2", "5"), A4=c("56", "11", "67", "16"), B3=c("35", "64", "87", "70"), B4=c("13", "64", "22", "12"), tgp=c("1154", "1200", "758", "900")))
In this example, I've taken values from df2 for field names "a", "b" and "d", from columns 'A4', 'B4' and 'tgp', and overwritten the values in the same locations in df1. For this small example I could do it by hand with indexing but I have many columns containing the required strings, and many rows.
I got this far (with help!) but it doesn't work because of the unequal number of rows in the two data frames:
columns <- which(grepl(colnames(df1), pattern = "4|tgp"))
rows <- which(apply(df1, 1, function(x) which(x %in% intersect(df1$field_name, df2$field_name)))==T)
new_df <- df1
for(col in columns){
for (r in rows){
new_df[r,col] <- df2[r,col]
}
}
I'd love a vectorized solution if possible. I've tried playing with 'mutate_at' and 'replace' but so far haven't managed to get anything that works. The following...
df1 %>%
left_join(df2, by = "field_name")
...from this post looked hopeful but I haven't yet managed to make the 'transmute' work.
Thank you very much.
Upvotes: 1
Views: 938
Reputation: 146224
The experimental function rows_update
(introduced in dplyr
version 1.0) does what you want nicely:
rows_update(
df1,
## use only the columns from df2 that you want to update
## plus the joining column
select(df2, field_name, tgp, ends_with("4")),
by = "field_name"
)
# field_name A3 A4 B3 B4 tgp
# 1 a 23 56 35 13 1154
# 2 b 35 11 64 64 1200
# 3 c 2 67 87 22 758
# 4 d 5 16 70 12 900
See ?rows_update
for details. There is also rows_insert
which adds new rows, rows_upsert
which adds new rows and updates existing rows, and a couple other options.
Upvotes: 1