filling in multiple columns based on a conditional?

Question

I have two rows with ~4000 parent and child ID's for different cancers, I've created two columns next to them titled "Child_Name" and "Parent_Name". df1 is like so:

Parent ID	Child ID	Parent Name	Child Name
D015179	D003110
D018307	D002294

In another data frame, I have parent and child ID's in one row, and the names of them in the adjacent column:

Cancer_ID	Cancer
D015179	Colorectal Neoplasms
D018307	Neoplasms, Squamous Cell
D002294	Carcinoma, Squamous Cell
D003110	Colonic Neoplasms

Essentially, I want to fill in Parent Name and Child Name in df1 by finding the corresponding cancer_ID in the df2, and place the cancer name in either the child or parent name respectively, so that it looks like this :

Parent ID	Child ID	Parent Name	Child Name
D015179	D003110	Colorectal Neoplasms	Colonic Neoplasms
D018307	D002294	Neoplasms, Squamous Cell	Carcinoma, Squamous Cell

I believe there is likely a dplyr solution for this, but I haven't been able to come up with anything solid, as always, any help would be greatly appreciated!

These are dput()'s of the first 20 rows of df1 and df2 respectively that can be worked with directly (I hope I have presented these in the correct format):

df1:

structure(list(Parent_ID = c("D015179", "D015179", "D001932", 
"D002528", "D018307", "D018307", "D003110", "D012004", "D015179", 
"D015179", "D009442", "D009455", "D018358", "D018358", "D018295", 
"D018295", "D001984", "D001984", "D010235", "D010235"), Child_ID = c("D003110", 
"D003110", "D002528", "D001932", "D002294", "D002294", "D012004", 
"D003110", "D012004", "D012004", "D009455", "D009442", "D018278", 
"D018278", "D002280", "D002280", "D002283", "D002283", "D010236", 
"D010236"), Child_Name = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Parent_Name = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA)), row.names = c(1L, 2L, 3L, 4L, 7L, 8L, 11L, 12L, 
15L, 16L, 27L, 28L, 31L, 32L, 37L, 38L, 39L, 40L, 53L, 54L), class = "data.frame")

df2:

structure(list(Cancer_ID = c("D009369", "D003560", "D001845", 
"D017824", "D007570", "D009631", "D009807", "D003803", "D018333", 
"D010509", "D011842", "D001935", "D047688", "D001994", "D017043", 
"D015529", "D004814", "D004934", "D005497", "D045888"), Cancer = c("Neoplasms", 
"Cysts", "Bone Cysts", "Bone Cysts, Aneurysmal", "Jaw Cysts", 
"Nonodontogenic Cysts", "Odontogenic Cysts", "Dentigerous Cyst", 
"Odontogenic Cyst, Calcifying", "Periodontal Cyst", "Radicular Cyst", 
"Branchioma", "Breast Cyst", "Bronchogenic Cyst", "Chalazion", 
"Choledochal Cyst", "Epidermal Cyst", "Esophageal Cyst", "Follicular Cyst", 
"Ganglion Cysts")), row.names = c(NA, 20L), class = "data.frame")

AnilGoyal · Accepted Answer

A slight variation of my friend's answer, using match() and gsub() in .names = argument of across

df <- read.table(header = T, text = "Parent_ID  Child_ID
D015179 D003110     
D018307 D002294")


lookup <- read.table(header = T, text = "Cancer_ID  Cancer
D015179 'Colorectal Neoplasms'
D018307 'Neoplasms, Squamous Cell'
D002294 'Carcinoma, Squamous Cell'
D003110 'Colonic Neoplasms'")

library(dplyr, warn.conflicts = F)

df %>% mutate(across(everything(), ~ lookup$Cancer[match(., lookup$Cancer_ID)],
                     .names = '{gsub("_ID", "_name", .col)}'))

#>   Parent_ID Child_ID              Parent_name               Child_name
#> 1   D015179  D003110     Colorectal Neoplasms        Colonic Neoplasms
#> 2   D018307  D002294 Neoplasms, Squamous Cell Carcinoma, Squamous Cell

^{Created on 2021-06-21 by the reprex package (v2.0.0)}

filling in multiple columns based on a conditional?

Answers (1)

Related Questions