Archetype2
Archetype2

Reputation: 97

Regex: match text between 2 items

How would I match the code below to get two strings:

  1. title to the third closing a tag
  2. 2nd title to the 6th closing a tag.(and so on...3rd title to the 9th closing a tag...etc)

Here is the string to be matched:

title
<a></a>
content here
<a></a>
text...
<a></a>
text...
title 
<a></a>
<a></a>
<a></a>

I tried using .* but this captured the text from the title to the last a tag.

Upvotes: 1

Views: 185

Answers (2)

Akinakes
Akinakes

Reputation: 657

from re import findall, DOTALL

text = '''
title
<a></a>
content here
<a></a>
text...
<a></a>
text...
title 
<a></a>
<a></a>
<a></a>
'''
print findall(r'title.*?</a>.*?</a>.*?</a>', text, DOTALL)

gives

['title\n<a></a>\ncontent here\n<a></a>\ntext...\n<a></a>', 'title \n<a></a>\n<a></a>\n<a></a>']

you can also use

print findall(r'title(?:.*?</a>){3}', text, DOTALL)

Upvotes: 1

Hyperboreus
Hyperboreus

Reputation: 32459

Generally * is greedy, while *? is reluctant. Try replacing .* with .*?.

Upvotes: 0

Related Questions