Reputation: 11
some text I want to capture. <tag> junk I don't care about</tag> more stuff I want.
Is there a easy way to write a regex that captures the first and third sentences in one capture?
Upvotes: 1
Views: 169
Reputation: 342323
here's a non regex way, split on </tag>
, go through the array items, find <tag>
, then split on <tag>
and get first element. eg
>>> s="some text I want to capture. <tag> junk I don't care about</tag> more stuff I want. <tag> don't care </tag> i care"
>>> for item in s.split("</tag>"):
... if "<tag>" in item:
... print item.split("<tag>")[0]
... else:
... print item
...
some text I want to capture.
more stuff I want.
i care
Use the split()
function of asp.net to do the same.
Upvotes: 0
Reputation: 1828
You could also consider stripping out the unwanted data and then capturing.
data = "some text to capture. <tag>junk</tag> other stuff to capture".
data = re.replace('<tag>[^<]*</tag>', data, "")
data_match = re.match('[\w\. ]+', data)
Upvotes: 1
Reputation: 6465
A group capture is consecutive so you cant. You can do it in one parse with regex like below and join the line in code
^(?<line1>.*?)(?:\<\w*\>.*?\</\w*\>)(?<line3>.*?)$
Upvotes: 0
Reputation: 4839
Unfortunately No, its not possible. The solution is to capture into two seperate captures and then contactenate after the fact.
According to this older thread on this site:
Regular expression to skip character in capture group
Upvotes: 0
Reputation: 526573
Not to my knowledge. Usually that's why regex search-and-replace functions allow you to refer to multiple capturing groups in the first place.
Upvotes: 0