Filipe Figueiredo
Filipe Figueiredo

Reputation: 35

Match and replace row values across two dataframes

I am trying to replace entrezgene_accession names with entrezgene_id, but I am not able to figure it out.
The idea is to replace the gene names such as cd37, and catb in df1, with their entrezgene_id that is in df2.
I have been trying to combine datasets using dplyr, but that has not worked.

# df1: 2,002 × 1
   id       
   <chr>    
 1 106590043
 2 cd37     
 3 106577144
 4 106561987
 5 106569503
 6 106571198
 7 106573872
 8 106601676
 9 106612275
10 catb     
# … with 1,992 more rows


# df2: 426 × 2
    entrezgene_accession entrezgene_id
    <chr>                <chr>        
 37 catb                 100195493    
 38 catk                 100195370    
 39 catl1                100286607    
 40 cats                 100196462    
 41 cav2                 106573118    
 42 cav2                 100196537    
 43 cb055                100306867    
 44 cbx6                 106591466    
 45 ccdc178              106569132    
 46 ccdc84               106603745    
 47 ccm2                 106571003    
 48 ccnb1                106563318    
 49 ccnd1                100306852    
 50 ccr3                 100380477    
 51 ccr6                 100194943    
 52 cd164                106607963    
 53 cd37                 100195746      
 # … with 416 more rows

Upvotes: 1

Views: 37

Answers (1)

Joe Erinjeri
Joe Erinjeri

Reputation: 1250

left_join from dplyr can help do the trick

library(dplyr)

df1<-tibble::tribble(
               ~id,
       "106590043",
            "cd37",
            "catb"
       )

df2<-tibble::tribble(
       ~entrezgene_accession, ~entrezgene_id,
                      "catb",     "100286607",
                      "catk",     "100195370",
                     "catl1",     "100286607",
                      "cd37",     "100195746"
       )

df_combined<-df1 %>%
  left_join(df2, by=c("id"="entrezgene_accession")) %>%
  mutate(complete_id=if_else(is.na(entrezgene_id),id,entrezgene_id))

df_combined
#> # A tibble: 3 × 3
#>   id        entrezgene_id complete_id
#>   <chr>     <chr>         <chr>      
#> 1 106590043 <NA>          106590043  
#> 2 cd37      100195746     100195746  
#> 3 catb      100286607     100286607

Created on 2022-01-09 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions