Reputation: 75
Im trying to create a regex to catch [[xyz|asd]], but not [[xyz]] In the text:
'''Diversity Day'''" is the second episode of the [[The Office (U.S. season 1)]|first season]] of the American [[comedy]] [[television program|television series]] ''[[The Office (U.S. TV series)|The Office]]'', and the show's second episode overall. Written by [[B. J. Novak]] and directed by [[Ken Kwapis]], it first aired in the United States on March 29, 2005, on [[NBC]]. The episode guest stars ''Office'' consulting producer [[Larry Wilmore]] as [[List_of_characters_from_The_Office_(US)#Mr._Brown|Mr. Brown]].
The following results should be captured:
[[The Office (U.S. season 1)]|first season]] <-- keep in mind of the "]" before "|", "]" in that case is a literal character not a breaking one "]]"
[[television program|television series]]
[[The Office (U.S. TV series)|The Office]]
[[List_of_characters_from_The_Office_(US)#Mr._Brown|Mr. Brown]]
I was trying to use is:
\[\[([^|]+)\|([^|]+)\]\]
but i cant figure out how to ignore both "|" and "]]" in the groups. [^|(]])] wont work because it wont match "]]" but only the character "]" (it needs to be the whole word)
Please help, thanks!
Upvotes: 2
Views: 545
Reputation: 626825
You may rely on a tempered greedy token here:
\[\[((?:(?!]]).)*)\|((?:(?!]]).)*)]]
See the regex demo
Details:
\[\[
- 2 [
symbols((?:(?!]]).)*)
- Group 1 (note the *
can be turned into a lazy *?
here especially if the first parts are shorter than the second parts) capturing:
(?:(?!]]).)*
- zero or more sequences of
.
- any char (but a newline, use the pattern with RegexOptions.Singleline
if your strings span across multiple lines)...(?!]])
- that is not starting a ]]
sequence (i.e. if the .
does not match a ]
that is followed with another ]
)\|
- a literal |
((?:(?!]]).)*)
- Group 2 capturing the same subpattern as Group 2]]
- 2 literal ]
on end.A much more efficient "unrolled" version of this regex is:
\[\[([^]|]*(?:](?!])[^]|]*)*)\|([^]]*(?:](?!])[^]]*)*)]]
See the regex demo. This regex will treat the first |
as the inner field separator. See my other answer about how to unroll tempered greedy tokens.
Upvotes: 6