mike
mike

Reputation: 11

combining captures in regex

some text I want to capture. <tag> junk I don't care about</tag> more stuff I want.

Is there a easy way to write a regex that captures the first and third sentences in one capture?

Upvotes: 1

Views: 169

Answers (5)

ghostdog74
ghostdog74

Reputation: 342323

here's a non regex way, split on </tag>, go through the array items, find <tag>, then split on <tag> and get first element. eg

>>> s="some text I want to capture. <tag> junk I don't care about</tag> more stuff I want. <tag> don't care </tag> i care"
>>> for item in s.split("</tag>"):
...     if "<tag>" in item:
...        print item.split("<tag>")[0]
...     else:
...        print item
...
some text I want to capture.
 more stuff I want.
 i care

Use the split() function of asp.net to do the same.

Upvotes: 0

Jake Woods
Jake Woods

Reputation: 1828

You could also consider stripping out the unwanted data and then capturing.

data = "some text to capture. <tag>junk</tag> other stuff to capture".
data = re.replace('<tag>[^<]*</tag>', data, "")
data_match = re.match('[\w\. ]+', data)

Upvotes: 1

Fadrian Sudaman
Fadrian Sudaman

Reputation: 6465

A group capture is consecutive so you cant. You can do it in one parse with regex like below and join the line in code

^(?<line1>.*?)(?:\<\w*\>.*?\</\w*\>)(?<line3>.*?)$

Upvotes: 0

bdk
bdk

Reputation: 4839

Unfortunately No, its not possible. The solution is to capture into two seperate captures and then contactenate after the fact.

According to this older thread on this site:

Regular expression to skip character in capture group

Upvotes: 0

Amber
Amber

Reputation: 526573

Not to my knowledge. Usually that's why regex search-and-replace functions allow you to refer to multiple capturing groups in the first place.

Upvotes: 0

Related Questions