Match and replace row values across two dataframes

Question

I am trying to replace entrezgene_accession names with entrezgene_id, but I am not able to figure it out.
The idea is to replace the gene names such as cd37, and catb in df1, with their entrezgene_id that is in df2.
I have been trying to combine datasets using dplyr, but that has not worked.

# df1: 2,002 × 1
   id       
       
 1 106590043
 2 cd37     
 3 106577144
 4 106561987
 5 106569503
 6 106571198
 7 106573872
 8 106601676
 9 106612275
10 catb     
# … with 1,992 more rows


# df2: 426 × 2
    entrezgene_accession entrezgene_id
                            
 37 catb                 100195493    
 38 catk                 100195370    
 39 catl1                100286607    
 40 cats                 100196462    
 41 cav2                 106573118    
 42 cav2                 100196537    
 43 cb055                100306867    
 44 cbx6                 106591466    
 45 ccdc178              106569132    
 46 ccdc84               106603745    
 47 ccm2                 106571003    
 48 ccnb1                106563318    
 49 ccnd1                100306852    
 50 ccr3                 100380477    
 51 ccr6                 100194943    
 52 cd164                106607963    
 53 cd37                 100195746      
 # … with 416 more rows

Joe Erinjeri · Accepted Answer

left_join from dplyr can help do the trick

library(dplyr)

df1<-tibble::tribble(
               ~id,
       "106590043",
            "cd37",
            "catb"
       )

df2<-tibble::tribble(
       ~entrezgene_accession, ~entrezgene_id,
                      "catb",     "100286607",
                      "catk",     "100195370",
                     "catl1",     "100286607",
                      "cd37",     "100195746"
       )

df_combined<-df1 %>%
  left_join(df2, by=c("id"="entrezgene_accession")) %>%
  mutate(complete_id=if_else(is.na(entrezgene_id),id,entrezgene_id))

df_combined
#> # A tibble: 3 × 3
#>   id        entrezgene_id complete_id
#>                       
#> 1 106590043           106590043  
#> 2 cd37      100195746     100195746  
#> 3 catb      100286607     100286607

^{Created on 2022-01-09 by the reprex package (v2.0.1)}

Match and replace row values across two dataframes

Answers (1)

Related Questions