xeroxSO
xeroxSO

Reputation: 21

the right regex expression in python

I have a small problem to extract the words which are in bold:

Médoc, Rouge
2ème Vin, Margaux, Rosé
2ème vin, Pessac-Léognan, Blanc

I have to clarify more my question : I'm trying to extract some information from web pages, so each time i found a kind of sentence but me i'm interesting in which is in bold. I give you the adress of the tree wab pages :

Any ideas?

Upvotes: 0

Views: 100

Answers (2)

Jerry
Jerry

Reputation: 71568

Seems like it's always the second to last term in the comma separated list? You can split and select the second to last, example:

>>> myStr = '2ème vin, Pessac-Léognan, Blanc'
>>> res = myStr.split(', ')[-2]

Otherwise, if you want regex alone... I'll suggest this:

>>> res = re.search(r'([^,]+),[^,]+$', myStr).group(1)

And trim if necessary for spaces.

Upvotes: 1

alecxe
alecxe

Reputation: 474003

You can use positive look ahead to see if Rouge or Blanc or Rosé is after the word we are looking for:

>>> import re
>>> l = [u"Médoc, Rouge", u"2ème Vin, Margaux, Rosé", u"2ème vin, Pessac-Léognan, Blanc"]
>>> for s in l:
...     print re.search(ur'([\w-]+)(?=\W+(Rouge|Blanc|Rosé))', s, re.UNICODE).group(0)
... 
Médoc
Margaux
Pessac-Léognan

Upvotes: 2

Related Questions