Reputation: 81
I have used seqinr package to read a certain fasta file with some genes in it. Now each gene has got attributes that give me a a line like this :
> getAnnot(g[1])
">Translation:ENSANGP00000020176 Database:core Gene:ENSANGG00000017687 Clone:AAAB01008888 Contig:AAAB01008888_84 Chr:2R Basepair:42989807 Status:known"
I want to get the result to be Gene:ENSANGG00000017687
Thanks here is my code
##rm(list=ls())
library(seqinr)
g<-seqinr::read.fasta('frthomas.fasta')
g2<-getAnnot(g[1:500])
Upvotes: 0
Views: 145
Reputation: 206566
You could also use regexec
here. For example if your string is stored in a
sapply(regmatches(a, regexec("Gene:(\\w+)\\b",a)), `[`, 2)
[1] "ENSANGG00000017687"
Here we take the second element in the list to just get the gene value. If you wanted the "Gene:" part as well, change the 2 to a 1.
Upvotes: 1
Reputation: 99371
Your desired result is in the third element after we split the string at every space with strsplit
> string <-
">Translation:ENSANGP00000020176 Database:core Gene:ENSANGG00000017687 Clone:AAAB01008888 Contig:AAAB01008888_84 Chr:2R Basepair:42989807 Status:known"
> unlist(strsplit(string, " "))[3]
# [1] "Gene:ENSANGG00000017687"
Upvotes: 0