How to use regex in Python list

Question

I need your help because want to use regex on a list to get only the string after my keyword.

my_list looks like:

 ['Paris, 458 boulevard Saint-Germain', 'Marseille, 29 rue Camille Desmoulins', 'Marseille, 1 chemin des Aubagnens']

The regex:

re.compile(ur'(?<=rue|boulevard|quai|chemin).*', re.MULTILINE)

Expected list after processing:

['Saint-Germain', 'Camille Desmoulins', 'des Aubagnens']

Thanks for you help.

Wiktor Stribiżew · Accepted Answer

It seems your regex does not work in Python, the error it throws is look-behind requires fixed-width pattern.

Also, please note that re.MULTILINE flag in your regex is redundant as there is no ^ nor $ to re-define behavior for in the pattern.

Here is the code you can use:

import re
lst =  ['Paris, 458 boulevard Saint-Germain', 'Marseille, 29 rue Camille Desmoulins', 'Marseille, 1 chemin des Aubagnens']
p = re.compile(r'.*(?:rue|boulevard|quai|chemin)')
print [p.sub('', x).strip() for x in lst]

IDEONE demo

Result:

['Saint-Germain', 'Camille Desmoulins', 'des Aubagnens']

The r'.*(?:rue|boulevard|quai|chemin)' regex matches

.* - 0 or more any character
(?:rue|boulevard|quai|chemin) - 1 of the alternatives delimited with |.

and then the matched text is removed with re.sub.

NOTE you can force whole word matching with \b word boundary so that chemin was matched and not chemins:

r'.*\b(?:rue|boulevard|quai|chemin)\b'

How to use regex in Python list

Answers (2)

Related Questions