Regexpr which excludes groups if they are precedeeded by curly brackets and only matches text within the first section of the bracket

Question

I'm writing a Python script to parse Wikipedia articles, and part of that process is parsing links. I'm trying to write a regular expression that matches in this way:

[[:Category:Anarchism by country|Anarchism by country]] -> :Category:Anarchism by country
[[Squatting|squat]] -> Squatting
[[File:Jarach and Zerzan.JPG|thumb|Lawrence Jarach (left) and [[John Zerzan]] (right) -> John Zerzan
* {{cite book |last=Avrich |first=Paul |author-link=Paul Avrich |title=[[Anarchist Voices: An Oral History of Anarchism in America]] |year=1996 |publisher=[[Princeton University Press]] |isbn=978-0-691-04494-1 -> Unmatched, begins with * {{ (citation)

I've reached $$\[([^|$$]+)(?:\|[^|\]]+)?\]\] which works in 3 of the above examples, but in the citation it matches the title and the publisher. I know (I think) I need a negative lookahead to prevent any matches in the last example. I'm very bad with regex however, so any suggestions would be greatly appreciated.

Regexpr which excludes groups if they are precedeeded by curly brackets and only matches text within the first section of the bracket

Answers (1)

Related Questions