user3729332
user3729332

Reputation: 81

How to catch a match a certain part of a string in R

I have used seqinr package to read a certain fasta file with some genes in it. Now each gene has got attributes that give me a a line like this :

> getAnnot(g[1])

">Translation:ENSANGP00000020176 Database:core Gene:ENSANGG00000017687 Clone:AAAB01008888 Contig:AAAB01008888_84 Chr:2R Basepair:42989807 Status:known"

I want to get the result to be Gene:ENSANGG00000017687

Thanks here is my code

##rm(list=ls())
library(seqinr)
g<-seqinr::read.fasta('frthomas.fasta')
g2<-getAnnot(g[1:500])

Upvotes: 0

Views: 145

Answers (2)

MrFlick
MrFlick

Reputation: 206566

You could also use regexec here. For example if your string is stored in a

sapply(regmatches(a, regexec("Gene:(\\w+)\\b",a)), `[`, 2)
[1] "ENSANGG00000017687"

Here we take the second element in the list to just get the gene value. If you wanted the "Gene:" part as well, change the 2 to a 1.

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99371

Your desired result is in the third element after we split the string at every space with strsplit

> string <-  
    ">Translation:ENSANGP00000020176 Database:core Gene:ENSANGG00000017687 Clone:AAAB01008888 Contig:AAAB01008888_84 Chr:2R Basepair:42989807 Status:known"
> unlist(strsplit(string, " "))[3]
# [1] "Gene:ENSANGG00000017687"

Upvotes: 0

Related Questions