Reputation: 85
dat1 <- c('human(display_long)|uniprotkb:ESR1(gene name)')
dat2 <- c('human(display_long)|uniprotkb:TP53(gene name)')
dat3 <- c('human(display_long)|uniprotkb:GPX4(gene name)')
dat4 <- c('human(display_long)|uniprotkb:ALOX15(gene name)')
dat5 <- c('human(display_long)|uniprotkb:PGR(gene name)')
dat <- c(dat1,dat2,dat3,dat4,dat5)
how to extract the gene name between 'human(display_long)|uniprotkb:' and '(gene name)' for vector dat.Thanks!
Upvotes: 2
Views: 47
Reputation: 887028
We can use str_remove_all
library(stringr)
str_remove_all(dat, ".*uniprotkb:|\\(.*")
[1] "ESR1" "TP53" "GPX4" "ALOX15" "PGR"
Or use trimws
from base R
trimws(dat, whitespace = ".*uniprotkb:|\\(.*")
[1] "ESR1" "TP53" "GPX4" "ALOX15" "PGR"
Upvotes: 0
Reputation: 39657
You can use regexpr
and regmatches
to extract the text between human(display_long)|uniprotkb:
and (gene name)
.
regmatches(dat
, regexpr("(?<=human\\(display_long\\)\\|uniprotkb:).*(?=\\(gene name\\))"
, dat, perl=TRUE))
#[1] "ESR1" "TP53" "GPX4" "ALOX15" "PGR"
Where (?<=human\\(display_long\\)\\|uniprotkb:)
is a positive look behind for human(display_long)|uniprotkb:
and (?=\\(gene name\\)
is a positive look ahead for (gene name)
and .*
is the text in between.
Another way is to use sub
but this might fail in case there is no match.
sub(".*human\\(display_long\\)\\|uniprotkb:(.*)\\(gene name\\).*", "\\1", dat)
#[1] "ESR1" "TP53" "GPX4" "ALOX15" "PGR"
Other ways not searching for the full pattern might be:
regmatches(dat, regexpr("(?<=:)[^(]*", dat, perl=TRUE))
sub(".*:([^(]*).*", "\\1", dat)
sub(".*:(.*)\\(.*", "\\1", dat)
Upvotes: 1
Reputation: 12699
Using stringr
and look behind you could try this:
library(stringr)
str_extract(dat, "(?<=:)[A-z0-9]+")
#[1] "ESR1" "TP53" "GPX4" "ALOX15" "PGR"
Assuming that there is only one colon which precedes the gene name.
Upvotes: 0
Reputation: 388862
You can try this regex which will extract the text between 'uniprotkb'
and opening round brackets ((
).
sub('.*uniprotkb:(\\w+)\\(.*', '\\1', dat)
#[1] "ESR1" "TP53" "GPX4" "ALOX15" "PGR"
Upvotes: 0