Reputation: 23
I've tried searching extensively for this, and there are similar problems but yet I haven't been able to figure this out.
My problem is that I have, among others, strings on this form:
%Aliquam hendrerit mollis pretium! Praesent id%
%molestie \*libero vel\%\% pulvinar? Sed%
\%% urna. \% Fusce% in *sapien %mau\*ris.%
I want to select everything between two %s, ignoring cases where characters are preceeded by a \. The first one is trivial, and I have somehow been able to do the second one. The third one however I just can't figure out. To clarify, from the text above I want to select the following:
"%Aliquam hendrerit mollis pretium! Praesent id%"
"%molestie *libero vel\%\% pulvinar? Sed%"
"% urna. \% Fusce%"
"%mau*ris.%"
Want to point out that the original text can be a part of one long string without a newline, i.e. each line does not necessarily appear on new lines.
This far I have written the following regular expression that seems to be able to match everything except the last one:
(?<!\\)%([^%]*)(?!%\\)(?:%|(.*)%)(?<!\\%)
For the last one it selects:
"% urna. \% Fusce% in *sapien %mau*ris.%"
Which is too much. I don't really understand why it does it, maybe it is because of the or-condition in my regex? Any help is much appreciated!
Upvotes: 2
Views: 1338
Reputation: 1018
This regex will give you the expected result :
/(?<!\\)(%.*?(?<!\\)%)/
See this Regex101.com
Explanation
1 - (?<!\\)%
will match any % character not preceded by a backslash.
2 - .*?
will match any character in a lazy way
3 - Surrounding (2) with (1) will match any character surrounded by a % not preceded by a backslash.
Upvotes: 2