KLM117
KLM117

Reputation: 467

filling in multiple columns based on a conditional?

I have two rows with ~4000 parent and child ID's for different cancers, I've created two columns next to them titled "Child_Name" and "Parent_Name". df1 is like so:

Parent ID Child ID Parent Name Child Name
D015179 D003110
D018307 D002294

In another data frame, I have parent and child ID's in one row, and the names of them in the adjacent column:

Cancer_ID Cancer
D015179 Colorectal Neoplasms
D018307 Neoplasms, Squamous Cell
D002294 Carcinoma, Squamous Cell
D003110 Colonic Neoplasms

Essentially, I want to fill in Parent Name and Child Name in df1 by finding the corresponding cancer_ID in the df2, and place the cancer name in either the child or parent name respectively, so that it looks like this :

Parent ID Child ID Parent Name Child Name
D015179 D003110 Colorectal Neoplasms Colonic Neoplasms
D018307 D002294 Neoplasms, Squamous Cell Carcinoma, Squamous Cell

I believe there is likely a dplyr solution for this, but I haven't been able to come up with anything solid, as always, any help would be greatly appreciated!

These are dput()'s of the first 20 rows of df1 and df2 respectively that can be worked with directly (I hope I have presented these in the correct format):

df1:

structure(list(Parent_ID = c("D015179", "D015179", "D001932", 
"D002528", "D018307", "D018307", "D003110", "D012004", "D015179", 
"D015179", "D009442", "D009455", "D018358", "D018358", "D018295", 
"D018295", "D001984", "D001984", "D010235", "D010235"), Child_ID = c("D003110", 
"D003110", "D002528", "D001932", "D002294", "D002294", "D012004", 
"D003110", "D012004", "D012004", "D009455", "D009442", "D018278", 
"D018278", "D002280", "D002280", "D002283", "D002283", "D010236", 
"D010236"), Child_Name = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Parent_Name = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA)), row.names = c(1L, 2L, 3L, 4L, 7L, 8L, 11L, 12L, 
15L, 16L, 27L, 28L, 31L, 32L, 37L, 38L, 39L, 40L, 53L, 54L), class = "data.frame")

df2:

structure(list(Cancer_ID = c("D009369", "D003560", "D001845", 
"D017824", "D007570", "D009631", "D009807", "D003803", "D018333", 
"D010509", "D011842", "D001935", "D047688", "D001994", "D017043", 
"D015529", "D004814", "D004934", "D005497", "D045888"), Cancer = c("Neoplasms", 
"Cysts", "Bone Cysts", "Bone Cysts, Aneurysmal", "Jaw Cysts", 
"Nonodontogenic Cysts", "Odontogenic Cysts", "Dentigerous Cyst", 
"Odontogenic Cyst, Calcifying", "Periodontal Cyst", "Radicular Cyst", 
"Branchioma", "Breast Cyst", "Bronchogenic Cyst", "Chalazion", 
"Choledochal Cyst", "Epidermal Cyst", "Esophageal Cyst", "Follicular Cyst", 
"Ganglion Cysts")), row.names = c(NA, 20L), class = "data.frame")

Upvotes: 1

Views: 151

Answers (1)

AnilGoyal
AnilGoyal

Reputation: 26218

A slight variation of my friend's answer, using match() and gsub() in .names = argument of across

df <- read.table(header = T, text = "Parent_ID  Child_ID
D015179 D003110     
D018307 D002294")


lookup <- read.table(header = T, text = "Cancer_ID  Cancer
D015179 'Colorectal Neoplasms'
D018307 'Neoplasms, Squamous Cell'
D002294 'Carcinoma, Squamous Cell'
D003110 'Colonic Neoplasms'")

library(dplyr, warn.conflicts = F)

df %>% mutate(across(everything(), ~ lookup$Cancer[match(., lookup$Cancer_ID)],
                     .names = '{gsub("_ID", "_name", .col)}'))

#>   Parent_ID Child_ID              Parent_name               Child_name
#> 1   D015179  D003110     Colorectal Neoplasms        Colonic Neoplasms
#> 2   D018307  D002294 Neoplasms, Squamous Cell Carcinoma, Squamous Cell

Created on 2021-06-21 by the reprex package (v2.0.0)

Upvotes: 2

Related Questions