HBat
HBat

Reputation: 5692

Matching multiple arguments in a string using regex

Suppose that I have following string:

mystr = """
<p>Some text and another text. </p> ![image_file_1][image_desc_1] some other text. 
<p>some text</p> 
![image_file_2][image_desc_2] and image: ![image_file_3][image_desc_3] 
test case 1: ![dont_match_1]
test case 2: [dont_match_2][dont_match_3]
finally: ![image_file_4][image_desc_4]
"""

I can get image_file_X's using the following code:

import re
re.findall('(?<=!\[)[^]]+(?=\]\[.*?\])', mystr)

I want to capture image_desc_X's but following does not work:

re.findall('(?!\[.*?\]\[)[^]]+(?=\])', mystr)

Any suggestions? If I can get both image_file's and image_desc's using one command that would be even better.

Upvotes: 1

Views: 1994

Answers (2)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Use the following approach:

result = re.findall(r'!\[([^]]+)\]\[([^]]+)\]', mystr)
print(result)

The output:

[('image_file_1', 'image_desc_1'), ('image_file_2', 'image_desc_2'), ('image_file_3', 'image_desc_3'), ('image_file_4', 'image_desc_4')]

Upvotes: 2

Pedro Lobito
Pedro Lobito

Reputation: 98921

I guess you can use:

for match in re.finditer(r"!\[(.*?)\]\[(.*?)]", mystr):
    print match.group(1)
    print match.group(2)

output:

image_file_1
image_desc_1
image_file_2
image_desc_2
image_file_3
image_desc_3
image_file_4
image_desc_4

DEMO

Upvotes: 1

Related Questions