Reputation: 5
This is a simple question - but I think I'm probably not including key words in google to find the right answer, so I am very sorry about that.
Basically I have one excel document with about 10000 gene names for some Brassica plants I had sequenced (in random order) and another document with the same (and more) gene names (ordered) but with the Arabidopsis gene they correspond to in the column next to it.
So for example:
File 1:
File 2:
Essentially, I want to annotate my sequenced Brassica genes (file 1) with their correct Arabidopsis label (second column of file 2) without reordering file 1 (so just adding a column to file 1 but so that each gene corresponds to its correct name).
I have tried to merge the lists on R but that doesn't work. Does anyone know how I could attempt this in R?
Thank you very much for any help.
Upvotes: 0
Views: 187
Reputation: 1076
It would really help if you could post the R code you used so far. In absence of that, we can only guess which types of data structures you're actually dealing with.
Anyways, your problem should be solved in a straightforward manner using tidyverse
.
Here's a rough draft:
library(tidyverse)
df_bras <- read_csv(
"brassica_genes.csv",
col_names = c("gene_bras"),
col_types = "c")
df_arab <- read_csv(
"arabidopsis_genes.csv",
col_name = c("gene_bras", "gene_arab"),
col_types = "cc")
df <- df_bras %>% left_join(df_arab, by = c("gene_bras"))
The resulting data frame df
would contain all Brassica genes, and the Arabidopsis gene name (if it is present in df_arab
) or NA
.
Upvotes: 1