Reputation: 22301
This question focuses on pcre-regular expression as used by grep -P
.
Imagine I have a string abcRabcSxyxz
and search for a substring which starts with abc
and ends with x
, but with the restriction that no shorter substring of this match would also also match.
My first attempt was a non-greedy regexp,
grep -Po 'abc.*?x' <<<abcRabcSxyxz
but this returns abcRabcSx, while I would like to find just abcSx. It is obvious why even my non-greedy attempt still provides a match which is too long; I need the regexp engine to try harder. My second attempt was
grep -Po '(?>abc.*?)x' <<<abcRabcSxyxz
which did not provide a match at all (maybe I don't really understand the usage of ($?...)
explained here).
Any easy solution for my problem anyone?
UPDATE I see from the comments that my example does not precisely explain what i am searching for, so here a more general description:
I am searching for matches of the form PXQ
, wher P, X and Q are arbitrary patterns, and X should not contain a match of P. Plus, I don't want to literally retype the pattern P inside X.
For instance
`[(][^(]*[)]`
would be a possible (but not satsifying) solution for the concrete case that I am searching for a parenthesized expression which does not contain another parenthesized (here, P is [(], X is an arbitrary string, and Q is [)]), but even this example shows that I have to literally repeat the information contained in P, when specifying the middle part ([^(]*), to make sure that my P is not contained there). I am looking for a way which makes this explicit repetition unnecessary.
Upvotes: 1
Views: 186
Reputation: 6094
Interesting question. Much of this having been worked out in comments, thanks Casimir et Hippolyte, Felix Kling, and user1934428.
The solution uses PCRE and is as follows:
grep -Po '(abc)(?:(?!(?1)).)*?x' <<< abcRabcSxyxz
We know the result will start with "abc" and end in "x". So let us wall through how this result works.
(abc)
to start.(
followed by ?:
prevents the subpattern from capturing or counted.(?!
.abc
)..
matches any character, in this case matching the S
.)*?
, an un-greedy, matching few as zero characters.x
, which the question designated as the ending character.Upvotes: 1