Reputation: 935
I have a string
"Signal recognition particle subunit SRP72 OS=Homo sapiens OX=9606 GN=SRP72 PE=1 SV=3"
and I would like to extract
"SRP72"
I am trying to use str_extract(), but it extracts pattern up to the last space and not to the first occurrence
str_extract(string = "Signal recognition particle subunit SRP72 OS=Homo sapiens OX=9606 GN=SRP72 PE=1 SV=3",
pattern = "(GN=).*( )")
thus, the pattern I get is "GN=SRP72 PE=1 ". If possible could you please give an answer with str_extract () function?
Upvotes: 0
Views: 221
Reputation: 887961
We can use regmatches/regexpr
in base R
regmatches(string, regexpr("(?<=GN=)\\w+", string, perl = TRUE))
#[1] "SRP72"
string <- "Signal recognition particle subunit SRP72 OS=Homo sapiens OX=9606 GN=SRP72 PE=1 SV=3"
Upvotes: 0
Reputation: 389325
Since you don't want to extract 'GN='
in the final output we can make use lookbehind regex and extract the first word (\\w+
) after occurrence of "GN="
.
string = "Signal recognition particle subunit SRP72 OS=Homo sapiens OX=9606 GN=SRP72 PE=1 SV=3"
stringr::str_extract(string, pattern = "(?<=GN=)\\w+")
#[1] "SRP72"
In base R, we can use sub
:
sub('.*GN=(\\w+).*', '\\1', string)
Upvotes: 1