Amelio Vazquez-Reina
Amelio Vazquez-Reina

Reputation: 96284

Building a regular expression to match the first of multiple occurrences in Python

Say I have the following text:

*<string1>*<string2>*<string3>*

where * indicates any text except things surrounded by <>.

I would like to be able to capture string1 (the first occurrence of text wrapped by the characters <>).

I have tried using:

r = re.compile('.*<(.*?)>.*(<.*?>)*.*')
r.search(my_text)
match = m.group(1)

but it didn't work.

I have no problem capturing string1 with a simpler regular expression if the text has only one occurrence of strings surrounded by <> with:

r = re.compile('.*<.*?>.*')

But I can't identify the correct regexp when the text has multiple regular expressions. I am not sure I am understanding the role of () and ? for this problem correctly.

How would you capture the first string1 in the top example above?

Upvotes: 1

Views: 250

Answers (2)

John La Rooy
John La Rooy

Reputation: 304167

This should do it

re.search("<([^>]*)", the_string).group(1)

Upvotes: 0

Jacob Eggers
Jacob Eggers

Reputation: 9322

Try this regex:

import re

my_text = "*<string1>*<string2>*<string3>*"
r = re.compile('(?<=\<)[^>]*')

print r.search(my_text).group(0)

print r.findall(my_text) #This will get an array of all matches.

The (?<=\<) is a lookbehind, meaning check for matching, but don't capture

Upvotes: 1

Related Questions