Reputation: 35
I am trying to replace entrezgene_accession names with entrezgene_id, but I am not able to figure it out.
The idea is to replace the gene names such as cd37, and catb in df1, with their entrezgene_id that is in df2.
I have been trying to combine datasets using dplyr, but that has not worked.
# df1: 2,002 × 1
id
<chr>
1 106590043
2 cd37
3 106577144
4 106561987
5 106569503
6 106571198
7 106573872
8 106601676
9 106612275
10 catb
# … with 1,992 more rows
# df2: 426 × 2
entrezgene_accession entrezgene_id
<chr> <chr>
37 catb 100195493
38 catk 100195370
39 catl1 100286607
40 cats 100196462
41 cav2 106573118
42 cav2 100196537
43 cb055 100306867
44 cbx6 106591466
45 ccdc178 106569132
46 ccdc84 106603745
47 ccm2 106571003
48 ccnb1 106563318
49 ccnd1 100306852
50 ccr3 100380477
51 ccr6 100194943
52 cd164 106607963
53 cd37 100195746
# … with 416 more rows
Upvotes: 1
Views: 37
Reputation: 1250
left_join from dplyr can help do the trick
library(dplyr)
df1<-tibble::tribble(
~id,
"106590043",
"cd37",
"catb"
)
df2<-tibble::tribble(
~entrezgene_accession, ~entrezgene_id,
"catb", "100286607",
"catk", "100195370",
"catl1", "100286607",
"cd37", "100195746"
)
df_combined<-df1 %>%
left_join(df2, by=c("id"="entrezgene_accession")) %>%
mutate(complete_id=if_else(is.na(entrezgene_id),id,entrezgene_id))
df_combined
#> # A tibble: 3 × 3
#> id entrezgene_id complete_id
#> <chr> <chr> <chr>
#> 1 106590043 <NA> 106590043
#> 2 cd37 100195746 100195746
#> 3 catb 100286607 100286607
Created on 2022-01-09 by the reprex package (v2.0.1)
Upvotes: 1