user1971853
user1971853

Reputation: 23

R code for Retrieving the List of Names from Ensembl database

This is the program written to convert the Entrez IDs into Gene Name using R. But the problem i am encountering is

Error in .checkKeysAreWellFormed(keys) : 
keys must be supplied in a character vector with no NAs

The program:

 a <- read.csv("C:\\list.csv", header = FALSE)
 a2 <- a[!is.na(a)]
for(i in 1:length(a))
{
    if(is.na(a[i])==TRUE)
    {
    next;
    } else {
       a2<-c(a2,a[i]);
       a3 <-lookUp(a2, 'org.Hs.eg', 'SYMBOL') 
    }
}

And the list looks like this: (list.csv)

5921,9315,10175,58155,1112,1974,2033,2309,3015,3192,5217,5411,5527,6660,8125,9743,10439,11174,23077,23097,26520,56929,84146,109,1073,1783,1809,1839,3169,3187,3768,4857,5066,5496,5594,5683,5885,6328,7490

Where is the problem?

Upvotes: 1

Views: 571

Answers (1)

Martin Morgan
Martin Morgan

Reputation: 46856

lookUp is from the Bioconductor package annotate. We can generate the error above with

> library(annotate)
> lookUp(list("123"), 'org.Hs.eg', 'SYMBOL')
Error in .checkKeysAreWellFormed(keys) : 
  keys must be supplied in a character vector with no NAs

and correct it by providing a character vector rather than list

> lookUp("123", 'org.Hs.eg', 'SYMBOL')
$`123`
[1] "PLIN2"

If your file "list.csv" really contains the single line you indicate, then I might

eid = strsplit(readLines("C:\\list.csv"), ",")[[1]]

to get a character vector of Entrez ids class(eid) will be "character". Clean it and do the look-up with

lookUp(eid[!is.na(eid)], "org.Hs.eg", "SYMBOL")

but a more 'modern' approach is

select(org.Hs.eg.db, eid, "SYMBOL")

which will handle both NA and invalid keys with less fuss

> select(org.Hs.eg.db, c(NA, "123", "xyz"), "SYMBOL")
  ENTREZID SYMBOL
1      123  PLIN2
2      xyz   <NA>
Warning message:
In .select(x, keys, cols, keytype, jointype = jointype) :
  'NA' keys have been removed

Upvotes: 2

Related Questions