Reputation: 49448
I don't quite understand why this doesn't result in "test"
and would appreciate an explanation:
a = "blah test"
sub('^.*(test|$)', '\\1', a)
# [1] ""
Compare it to the sed
expression:
echo 'blah test' | sed -r 's/^.*(test|$)/\1/'
# test
echo 'blah blah' | sed -r 's/^.*(test|$)/\1/'
#
Fwiw, the following achieves what I want in R (and is equivalent to the above sed
results):
sub('^.*(test)|^.*', '\\1', a)
Upvotes: 4
Views: 963
Reputation: 49810
You need to mark the ^.*
as non-greedy
> sub('^.*?(test|$)', '\\1', "blah test")
[1] "test"
> sub('^.*?(test|$)', '\\1', "blah blah")
[1] ""
Upvotes: 5
Reputation: 5012
The start of the regex engine
matchs all the characters right upto the end of the string i.e greedy .*
, then it tries to match (test|$)
, i.e either the string literal 'test' or the end of the string. Since the first greedy match of .*
matched all the characters, it back-references
a character and then again tries to match (test|$)
, here $
matches the end of the string.
Causing your match result to be a end of line character
I think sed
uses POSIX NFA which tries to find the longest match in a Alternation, which differs from R
, which seems to use a Traditional NFA
Upvotes: 2