Reputation: 7351
I have a long vector of strings containing a market name and other stuff
S = c('123_GOLD_534', '531_SILVER_dfds', '93_COPPER_29dad', '452_GOLD_deww')
and another vector contains all the possible markets
V = c('GOLD','SILVER')
How can I extract the market name bit from S? Basically I want to loop over V
and S
, replace S[j]
with V[i]
if grepl(V[i], S[j])
.
So the result should look like
c('GOLD','SILVER',NA,'GOLD')
Upvotes: 2
Views: 398
Reputation: 627468
You may use str_extract
from stringr:
> library(stringr)
> str_extract(S, paste(V, collapse="|"))
[1] "GOLD" "SILVER" NA "GOLD"
The paste(V, collapse="|")
will create a regex like GOLD|SILVER
and will thus extract GOLD
or SILVER
. If the regex does not match, it will just return NA.
Note that if you need to match GOLD
or SILVER
only when enclosed with _
symbols, replace paste(V, collapse="|")
with paste0("(?<=_)(?:", paste(V, collapse="|"), ")(?=_)")
:
> str_extract(S, paste0("(?<=_)(?:", paste(V, collapse="|"), ")(?=_)"))
[1] "GOLD" "SILVER" NA "GOLD"
It will create a regex like (?<=_)(?:GOLD|SILVER)(?=_)
and will only match GOLD
or SILVER
if there is a _
in front ((?<=_)
, a positive lookbehind) and if there is a _
after the value (due to the (?=_)
positive lookahead). Lookaheads do not add matched text to the match (they are non-consuming).
Upvotes: 4