Reputation: 97
How would I match the code below to get two strings:
Here is the string to be matched:
title
<a></a>
content here
<a></a>
text...
<a></a>
text...
title
<a></a>
<a></a>
<a></a>
I tried using .* but this captured the text from the title to the last a tag.
Upvotes: 1
Views: 185
Reputation: 657
from re import findall, DOTALL
text = '''
title
<a></a>
content here
<a></a>
text...
<a></a>
text...
title
<a></a>
<a></a>
<a></a>
'''
print findall(r'title.*?</a>.*?</a>.*?</a>', text, DOTALL)
gives
['title\n<a></a>\ncontent here\n<a></a>\ntext...\n<a></a>', 'title \n<a></a>\n<a></a>\n<a></a>']
you can also use
print findall(r'title(?:.*?</a>){3}', text, DOTALL)
Upvotes: 1
Reputation: 32459
Generally *
is greedy, while *?
is reluctant. Try replacing .*
with .*?
.
Upvotes: 0