Reputation: 23206
I am trying to extract 22 chocolates
from the following string:
SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila.
using regex \\d+\\s*(chocolates.|chocolate.)
. I used :
grep("\\d+\\s*(chocolates.|chocolate.)",s)
but it does not give the string 22 chocolates
. How could I extract the part that is matching the regex?
Upvotes: 2
Views: 84
Reputation: 626699
Your original pattern does not return 22 chocolates
because it is a pattern that should be used in a matching function, while grep
only returns whole items in a character vector that contain the match anywhere inside.
Also, note that (chocolates.|chocolate.)
alternation group can be shortened to chocolates?.
since the only difference is the plural case for chocolate
and it can easily be achieved with a ?
quantifier (=1 or 0 occurrences).
A matching function example can be with stringr::str_extract
(str_extract_all
to match all occurrences):
> library(stringr)
> x <- " SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila."
> p <- "\\d+\\s*chocolates?"
> str_extract(x, p)
[1] "22 chocolates"
Or a base R regmatches
/regexpr
(or gregexpr
to extract multiple occurrences) approach:
> library(stringr)
> x <- " SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila."
> p <- "\\d+\\s*chocolates?"
> regmatches(x, regexpr(p, x))
[1] "22 chocolates"
Upvotes: 0
Reputation: 520928
Here is an option using sub
from base R:
x <- "SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila."
sub(".*?(\\d+ chocolates?).*", "\\1", x)
22 chocolates
The pattern in parentheses, (\\d+ chocolates?)
, is a capture group, and is available as \\1
after sub
has run on the match.
Edit:
As you have seen, if sub
cannot find an exact match, it will return the input string. This behavior often makes sense, because in a case where a substitution does not make sense, you would want the input to not be changed.
If you need to find out whether or not the pattern matches, then calling grep
is one option:
grep(".*(\\d+ chocolates?).*",x,value = FALSE)
Upvotes: 4