Reputation: 1
I am trying to change the value in one dataframe based on the conditions pulling information from another dataframe.
Lets say I have 2 dataframes linked by the common column ID:
df1
ID a b c
1 111 NA NA NA
2 222 NA NA NA
3 333 NA NA NA
4 444 NA NA NA
5 555 NA NA NA
6 666 NA NA NA
df2
ID a b c
1 111 Open No Yes
2 222 Closed No No
3 333 Open Yes Yes
4 444 Open No No
5 555 Closed No Yes
6 666 Open Yes Yes
All of the columns in df1 are NA with the exception of the ID column. df1 and df2 are not the same size, and df2 has significantly more rows and IDs than df1. df2 also has more columns than df1.
Using the list of IDs in df1, I want to find the specific row in df2 and pull data from corresponding columns into df1. Ill use column 'a' as an example.
For each ID in df1, I want to identify whether that ID had the value OPEN or CLOSED in df2[a]. If the value of df2[a] = OPEN then df1[a] should display OPEN, otherwise it should display CLOSED.
I have tried solving this problem multiple ways using several packages but do not have a ton of experience. Any advice on how to do this is greatly appreciated!
Upvotes: 0
Views: 26
Reputation: 5297
This is essentially just a left join with dplyr::left_join()
.
df1 <-
tibble::tibble(
ID = c(111L, 222L, 333L),
a = NA
)
df2 <-
tibble::tribble(
~ID , ~a , ~b , ~c,
111L , "Open" , "No" , "Yes",
222L , "Closed" , "No" , "No",
333L , "Open" , "Yes" , "Yes",
444L , "Open" , "No" , "No",
555L , "Closed" , "No" , "Yes",
666L , "Open" , "Yes" , "Yes"
)
df1 |>
dplyr::select( # Retain only the column(s) you need
ID
) |>
dplyr::left_join(df2, by = "ID") # Join on the ID column
Result:
# A tibble: 3 x 4
ID a b c
<int> <chr> <chr> <chr>
1 111 Open No Yes
2 222 Closed No No
3 333 Open Yes Yes
Upvotes: 0