Reputation: 96284
Say I have the following text:
*<string1>*<string2>*<string3>*
where *
indicates any text except things surrounded by <>
.
I would like to be able to capture string1
(the first occurrence of text wrapped by the characters <>
).
I have tried using:
r = re.compile('.*<(.*?)>.*(<.*?>)*.*')
r.search(my_text)
match = m.group(1)
but it didn't work.
I have no problem capturing string1
with a simpler regular expression if the text has only one occurrence of strings surrounded by <>
with:
r = re.compile('.*<.*?>.*')
But I can't identify the correct regexp when the text has multiple regular expressions. I am not sure I am understanding the role of ()
and ?
for this problem correctly.
How would you capture the first string1
in the top example above?
Upvotes: 1
Views: 250
Reputation: 304167
This should do it
re.search("<([^>]*)", the_string).group(1)
Upvotes: 0
Reputation: 9322
Try this regex:
import re
my_text = "*<string1>*<string2>*<string3>*"
r = re.compile('(?<=\<)[^>]*')
print r.search(my_text).group(0)
print r.findall(my_text) #This will get an array of all matches.
The (?<=\<)
is a lookbehind, meaning check for matching, but don't capture
Upvotes: 1