DevEx
DevEx

Reputation: 4561

regex match for words with spaces in between them

I have this regex pattern ([^\s|:]+):\s*([^\s|:]+) which works well for name:jones|location:london|age:23. How can I extend the regex pattern to cover spaced words having space between them or words combined with numbers, for example: full name:jones hardy|city and dialling code :london 0044|age:23 years

>>> ("full name", "jones hardy") ("city and dialling code", "london 0044")("age","23 years")

Upvotes: 0

Views: 9248

Answers (3)

WKPlus
WKPlus

Reputation: 7255

>>> s= "full name:jones hardy|city and dialling code :london 0044|age:23 years"
>>> r=r"([^|:]+?)\s*:\s*([^|:]+)"
>>> re.findall(r, s)
[('full name', 'jones hardy '), ('city and dialling code', 'london 0044'), ('age', '23 years')]

So, the space at the end of 'city and dialling code ' will be eliminated.

But if there are spaces beforce '|', it will not be eliminated:

>>> s="full name:jones hardy |city and dialling code :london 0044|age:23 years"
>>> re.findall(r, s)
[('full name', 'jones hardy '), ('city and dialling code', 'london 0044'), ('age', '23 years')]

The will be a space at the end of 'jones hardy '.

EDIT

r"\s*([\w\s]+?)\s*:\s*([\w\s]+?)\s*(?:\||$)" will eliminate all spaces at the begin and the end of the target string:

>>> s
'  full name: jones hardy | city and dialling code :london 0044|age:23 years'
>>> r=r"\s*([\w\s]+?)\s*:\s*([\w\s]+?)\s*(?:\||$)"
>>> re.findall(r, s)
[('full name', 'jones hardy'), ('city and dialling code', 'london 0044'), ('age', '23 years')]

Upvotes: 2

Abhijit
Abhijit

Reputation: 63727

Simplify your regex, to capture everything except the delimiter which in your case is colon : or pipe |

>>> r = r"([^:|]+)\s*:\s*([^:|]+)"
>>> st = "full name:jones hardy|city and dialling code :london 0044"
>>> re.findall(r, st)
[('full name', 'jones hardy'), ('city and dialling code ', 'london 0044')]
>>> st="name:jones|location:london|age:23"
>>> re.findall(r, st)
[('name', 'jones'), ('location', 'london'), ('age', '23')]

Upvotes: 1

Chris Martin
Chris Martin

Reputation: 30736

This situation seems like it calls for re.split.

>>> s = "full name:jones hardy|city and dialling " \
...     "code :london 0044|age:23 years"
>>> [tuple(re.split('\s*:\s*', t))
...  for t in re.split('\s*\|\s*', s)]
[('full name', 'jones hardy'),
 ('city and dialling code', 'london 0044'),
 ('age', '23 years')]

Upvotes: 2

Related Questions