Reputation: 3601
I want to use regular expression to extract content by negating a group (instead of doing search and replace)
To get a Infobox block I am using the following regex.
(\{\{Infobox(?:.*?)^\}\})
How do I negate that group, so that text without the infobox is returned. I have tried many combination like
(.*(?!(?:\{\{Infobox(?:.*?)^\}\})).*)
Here is a sample text that I am trying to extract.
<username>Majorclanger</username>
<id>817248</id>
</contributor>
<minor />
<comment>rm unneeded hyphen</comment>
<text xml:space="preserve">{{sprotected2}}
{{Infobox MLB player
| birthplace = {{city-state|Riverside|California}}
| debutdate = May 30
| debutyear = 1986
}}
==Early life==
{{Infobox Person
|parents =
|relatives =
|signature =
|website =
}}
Born in {{city-state|Riverside|California}}, Bonds grew up in {{city-state|San Carlos|California}} and attended
Upvotes: 0
Views: 274
Reputation: 14251
It might depend on the regex dialect of the language you're working with, in Python you could do the following:
pattern = re.compile('{{Infobox.*?\n}}', re.DOTALL)
print re.sub(pattern, '', s)
Upvotes: 1