Reputation: 3
I have a basic question about updating value in certain rows based on a reference. I am sure someone else has asked similar quesions, but I could not find the exact answers. Below is the description of my question. It would be great if someone could suggest a link of answer or provide solutions here. Thank you in advance.
I have a table with more than 10,000 rows, like this:
Quality City
a1 pending NY
a2 pending LA
a3 pending LA
a4 pending SF
a5 pending SF
a6 pending NY
a7 pending Chicago
...
a10000 pending DC
I would like to update the name in column "Quality" based on a subset of reference, like this:
Quality City
a1 bad NY
a3 good LA
a5 good SF
a6 bad NY
...
a200 good Chicago
Here is the result I am looking for:
Quality City
a1 bad NY
a2 pending LA
a3 good LA
a4 pending SF
a5 good SF
a6 bad NY
a7 pending Chicago
...
a200 good Chicago
...
a10000 pending DC
Upvotes: 0
Views: 29
Reputation: 79208
We could do a full_join
, then coalesce
the quality columns:
library(tidyverse)
df1 %>%
rownames_to_column("id") %>%
full_join(rownames_to_column(df2, "id"), by=c("id", "City")) %>%
mutate(Quality = coalesce(Quality.y, Quality.x), Quality.x = NULL, Quality.y = NULL)
id City Quality
1 a1 NY bad
2 a2 LA pending
3 a3 LA good
4 a4 SF pending
5 a5 SF good
6 a6 NY bad
7 a7 Chicago pending
8 a10000 DC pending
9 a200 Chicago good
Upvotes: 1
Reputation: 887048
We can use a left_join
library(dplyr)
left_join(df1, df2, by = 'City') %>%
mutate(Quality = Quality.y)
Upvotes: 1