Reputation: 2745
my code is like:
string = "title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red title=xxxy red=anything title=xxxyyy color=red"
pattern = r'title=(.*?) color=red'
print re.compile(pattern).search(string).group(0)
and I got
"title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red title=xxxy red=anything title=xxxyyy color=red"
But I want to find all the contents of "title"s immediately followed by "color=red"
Upvotes: 3
Views: 139
Reputation: 114569
If you want to get the last match of a sub-regexp before a certain regexp the solution is to use a greedy skipper. For example:
>>> pattern = '.*title="([^"]*)".*color="#123"'
>>> text = 'title="123" color="#456" title="789" color="#123"'
>>> print(re.match(pattern, s).groups(1))
the first .*
is greedy and it will skip as much as possible (thus skipping first title
) backing up to the one that allows matching the desired color.
As a simpler example consider that
a(.*)b(.*)c
processed on
a1111b2222b3333c
will match 1111b2222
in the first group and 3333
in the second.
Upvotes: 1
Reputation: 20359
Try this using re module
>>>string = 'title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red'
>>>import re
>>>re.search('(.*title=?)(.*) color=red', string).group(2)
'whatIwaht'
>>>re.search('(.*title=?)(.*) color=red', string).group(2)
'xyxyx'
Upvotes: 0
Reputation:
Why don't you skip the regexes, and use some split functionality instead:
search_title = False
found = None
string = "title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht colo\
r=red title=xxxy red=anything title=xxxyyy color=red"
parts = string.split()
for part in parts:
key, value = part.split('=', 1)
if search_title:
if key == 'title':
found = value
search_title = False
if key == 'color' and value == 'red':
search_title = True
print(found)
results in
xxxy
Regexes are nice, but can cause headaches at times.
Upvotes: 0
Reputation: 179552
You want what immediately precedes color=red
? Then use
.*title=(.*?) color=red
Demo: https://regex101.com/r/sR4kN2/1
This greedily matches everything that comes before color=red
, so that only the desired title appears.
Alternatively, if you know there is a character that doesn't appear in the title, you can simplify by just using a character class exclusion. For example, if you know =
won't appear:
title=([^=]*?) color=red
Or, if you know whitespace won't appear:
title=([^\s]*?) color=red
A third option, using a bit of code to find all red titles (assuming that the input always alternates title, color):
for title, color in re.findall(r'title=(.*?) color=(.*?)\( |$\)'):
if color == 'red':
print title
Upvotes: 1