Extract words/sentence that occurs before a keyword from a string - Python

Question

I have a string like this,

my_str ='·in this match, dated may 1, 2013 (the "the match") is between brooklyn centenniel, resident of detroit, michigan ("champion") and kamil kubaru, the challenger from alexandria, virginia ("underdog").'

Now, I want to extract the current champion and the underdog using keywords champion and underdog .

What is really challenging here is both contender's names appear before the keyword inside parenthesis. I want to use regular expression and extract information.

Following is what I did,

champion = re.findall(r'("champion"[^.]*.)', my_str)
print(champion)

>> ['"champion") and kamil kubaru, the challenger from alexandria, virginia ("underdog").']


underdog = re.findall(r'("underdog"[^.]*.)', my_str)
print(underdog)

>>['"underdog").']

However, I need the results, champion as:

brooklyn centenniel, resident of detroit, michigan

and the underdog as:

kamil kubaru, the challenger from alexandria, virginia

How can I do this using regular expression? (I have been searching, if I could go back couple or words from the keyword to get the result I want, but no luck yet) Any help or suggestion would be appreciated.

heemayl · Accepted Answer

You can use named captured group to capture the desired results:

between\s+(?P.*?)\s+$"champion"$\s+and\s+(?P.*?)\s+$"underdog"$

between\s+(?P.*?)\s+$"champion"$ matches the chunk from between to ("champion") and put the desired portion in between as the named captured group champion
After that, \s+and\s+(?P.*?)\s+$"underdog"$ matches the chunk upto ("underdog") and again get the desired portion from here as named captured group underdog

Example:

In [26]: my_str ='·in this match, dated may 1, 2013 (the "the match") is between brooklyn centenniel, resident of detroit, michigan ("champion") and kamil kubaru, the challenger from alexandria, virginia 
    ...: ("underdog").'

In [27]: out = re.search(r'between\s+(?P.*?)\s+$"champion"$\s+and\s+(?P.*?)\s+$"underdog"$', my_str)

In [28]: out.groupdict()
Out[28]: 
{'champion': 'brooklyn centenniel, resident of detroit, michigan',
 'underdog': 'kamil kubaru, the challenger from alexandria, virginia'}

Extract words/sentence that occurs before a keyword from a string - Python

Answers (2)

Related Questions