Scheme
Scheme

Reputation: 23

Regex, everything between two characters except escaped characters

I've tried searching extensively for this, and there are similar problems but yet I haven't been able to figure this out.

My problem is that I have, among others, strings on this form:

%Aliquam hendrerit mollis pretium! Praesent id%
%molestie \*libero vel\%\% pulvinar? Sed%
\%% urna. \% Fusce% in *sapien %mau\*ris.%

I want to select everything between two %s, ignoring cases where characters are preceeded by a \. The first one is trivial, and I have somehow been able to do the second one. The third one however I just can't figure out. To clarify, from the text above I want to select the following:

"%Aliquam hendrerit mollis pretium! Praesent id%"

"%molestie *libero vel\%\% pulvinar? Sed%"

"% urna. \% Fusce%"

"%mau*ris.%"

Want to point out that the original text can be a part of one long string without a newline, i.e. each line does not necessarily appear on new lines.

This far I have written the following regular expression that seems to be able to match everything except the last one:

(?<!\\)%([^%]*)(?!%\\)(?:%|(.*)%)(?<!\\%)

For the last one it selects:

"% urna. \% Fusce% in *sapien %mau*ris.%"

Which is too much. I don't really understand why it does it, maybe it is because of the or-condition in my regex? Any help is much appreciated!

Upvotes: 2

Views: 1338

Answers (1)

Paul-Etienne
Paul-Etienne

Reputation: 1018

This regex will give you the expected result :

/(?<!\\)(%.*?(?<!\\)%)/

See this Regex101.com

Explanation

1 - (?<!\\)% will match any % character not preceded by a backslash.

2 - .*? will match any character in a lazy way

3 - Surrounding (2) with (1) will match any character surrounded by a % not preceded by a backslash.

Upvotes: 2

Related Questions