Joe.Z
Joe.Z

Reputation: 2745

How to get a value for a key in a string, when followed by another specific key=value set

my code is like:

string = "title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red title=xxxy red=anything title=xxxyyy color=red"
pattern = r'title=(.*?) color=red'
print re.compile(pattern).search(string).group(0)

and I got

"title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red title=xxxy red=anything title=xxxyyy color=red"

But I want to find all the contents of "title"s immediately followed by "color=red"

Upvotes: 3

Views: 139

Answers (4)

6502
6502

Reputation: 114569

If you want to get the last match of a sub-regexp before a certain regexp the solution is to use a greedy skipper. For example:

>>> pattern = '.*title="([^"]*)".*color="#123"'
>>> text = 'title="123" color="#456" title="789" color="#123"'
>>> print(re.match(pattern, s).groups(1))

the first .* is greedy and it will skip as much as possible (thus skipping first title) backing up to the one that allows matching the desired color.

As a simpler example consider that

a(.*)b(.*)c

processed on

a1111b2222b3333c

will match 1111b2222 in the first group and 3333 in the second.

Upvotes: 1

itzMEonTV
itzMEonTV

Reputation: 20359

Try this using re module

>>>string = 'title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red'
>>>import re
>>>re.search('(.*title=?)(.*) color=red', string).group(2)
'whatIwaht'

>>>re.search('(.*title=?)(.*) color=red', string).group(2)
'xyxyx'

Upvotes: 0

user707650
user707650

Reputation:

Why don't you skip the regexes, and use some split functionality instead:

search_title = False
found = None
string = "title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht colo\
r=red title=xxxy red=anything title=xxxyyy color=red"
parts = string.split()
for part in parts:
    key, value = part.split('=', 1)
    if search_title:
        if key == 'title':
            found = value
        search_title = False
    if key == 'color' and value == 'red':
        search_title = True
print(found)

results in

xxxy

Regexes are nice, but can cause headaches at times.

Upvotes: 0

nneonneo
nneonneo

Reputation: 179552

You want what immediately precedes color=red? Then use

.*title=(.*?) color=red

Demo: https://regex101.com/r/sR4kN2/1

This greedily matches everything that comes before color=red, so that only the desired title appears.


Alternatively, if you know there is a character that doesn't appear in the title, you can simplify by just using a character class exclusion. For example, if you know = won't appear:

title=([^=]*?) color=red

Or, if you know whitespace won't appear:

title=([^\s]*?) color=red

A third option, using a bit of code to find all red titles (assuming that the input always alternates title, color):

for title, color in re.findall(r'title=(.*?) color=(.*?)\( |$\)'):
    if color == 'red':
        print title

Upvotes: 1

Related Questions