Reputation: 163
I rarely write Perl and don't know how to phrase the question. I am using Perl as a "filter" to go through files.
echo "this is a test" | perl -pe 's/(this).*(test)?/\1 \2/'
returns only
this
I am looking for
this test
Upvotes: 1
Views: 148
Reputation:
Since you're using Perl, this is a good way to do it.
Use this if you don't want to use any anchors.
Maybe you are not in a multi-line environment.
Btw, anchors are a crutch avoid using them if possible,
it will expand your mind.
(this)(?|.*(test)|.*())
https://regex101.com/r/1p4FVK/1
( this ) # (1)
(?| # Branch reset, reuse grp 2
.*
( test ) # (2)
|
.*
( ) # (2)
)
Without the branch reset it's (this)(?:.*(test)|.*())
Replace with $1 $2$3
Upvotes: -1
Reputation: 9231
Regexes (the feature within the first section of the s///
operator) match against the provided text sequentially, greedily. This means that it will first find this
(easy enough), then .*
will match the entire rest of the string. (test)?
is matched against the remaining string, which is nothing, and since it's optional, it succeeds.
One way to prevent .*
from matching the rest of the string before the next part can try is to make it non-greedy, this is done by attaching the ?
quantifier modifier (not to be confused with the ?
quantifier which means zero-or-one). But this doesn't help here, because then it will just match the empty string (as the shortest string it can match), and (test)?
will also still match the empty string afterward since it's not immediately followed by test
.
Depending what you are trying to do, there are a couple possible solutions. First would be to make the (test)
group non-optional by removing the ?
, which will cause the match to try smaller and smaller matches for .*
until the following text successfully matches (test)
(a regex feature known as backtracking). Another option is anchoring the match to the end of the string with $
after a non-greedy .*?
so that it will always look for (test)
at the end of the string before falling back to matching the empty string (via sort of reverse backtracking).
/(this).*(test)/
/(this).*?(test)?$/
As a side note, your replacement variables should be $1
and $2
, not \1
and \2
; backslash variables are for use within the regex itself, and using them in the replacement is only supported as it's a feature of sed.
Upvotes: 5