Reputation: 5692
Suppose that I have following string:
mystr = """
<p>Some text and another text. </p> ![image_file_1][image_desc_1] some other text.
<p>some text</p>
![image_file_2][image_desc_2] and image: ![image_file_3][image_desc_3]
test case 1: ![dont_match_1]
test case 2: [dont_match_2][dont_match_3]
finally: ![image_file_4][image_desc_4]
"""
I can get image_file_X
's using the following code:
import re
re.findall('(?<=!\[)[^]]+(?=\]\[.*?\])', mystr)
I want to capture image_desc_X
's but following does not work:
re.findall('(?!\[.*?\]\[)[^]]+(?=\])', mystr)
Any suggestions? If I can get both image_file
's and image_desc
's using one command that would be even better.
Upvotes: 1
Views: 1994
Reputation: 92854
Use the following approach:
result = re.findall(r'!\[([^]]+)\]\[([^]]+)\]', mystr)
print(result)
The output:
[('image_file_1', 'image_desc_1'), ('image_file_2', 'image_desc_2'), ('image_file_3', 'image_desc_3'), ('image_file_4', 'image_desc_4')]
Upvotes: 2
Reputation: 98921
I guess you can use:
for match in re.finditer(r"!\[(.*?)\]\[(.*?)]", mystr):
print match.group(1)
print match.group(2)
output:
image_file_1
image_desc_1
image_file_2
image_desc_2
image_file_3
image_desc_3
image_file_4
image_desc_4
Upvotes: 1