Reputation: 4561
I have this regex pattern ([^\s|:]+):\s*([^\s|:]+)
which works well for name:jones|location:london|age:23
. How can I extend the regex pattern to cover spaced words having space between them or words combined with numbers, for example: full name:jones hardy|city and dialling code :london 0044|age:23 years
>>> ("full name", "jones hardy") ("city and dialling code", "london 0044")("age","23 years")
Upvotes: 0
Views: 9248
Reputation: 7255
>>> s= "full name:jones hardy|city and dialling code :london 0044|age:23 years"
>>> r=r"([^|:]+?)\s*:\s*([^|:]+)"
>>> re.findall(r, s)
[('full name', 'jones hardy '), ('city and dialling code', 'london 0044'), ('age', '23 years')]
So, the space at the end of 'city and dialling code '
will be eliminated.
But if there are spaces beforce '|'
, it will not be eliminated:
>>> s="full name:jones hardy |city and dialling code :london 0044|age:23 years"
>>> re.findall(r, s)
[('full name', 'jones hardy '), ('city and dialling code', 'london 0044'), ('age', '23 years')]
The will be a space at the end of 'jones hardy '
.
r"\s*([\w\s]+?)\s*:\s*([\w\s]+?)\s*(?:\||$)"
will eliminate all spaces at the begin and the end of the target string:
>>> s
' full name: jones hardy | city and dialling code :london 0044|age:23 years'
>>> r=r"\s*([\w\s]+?)\s*:\s*([\w\s]+?)\s*(?:\||$)"
>>> re.findall(r, s)
[('full name', 'jones hardy'), ('city and dialling code', 'london 0044'), ('age', '23 years')]
Upvotes: 2
Reputation: 63727
Simplify your regex, to capture everything except the delimiter which in your case is colon :
or pipe |
>>> r = r"([^:|]+)\s*:\s*([^:|]+)"
>>> st = "full name:jones hardy|city and dialling code :london 0044"
>>> re.findall(r, st)
[('full name', 'jones hardy'), ('city and dialling code ', 'london 0044')]
>>> st="name:jones|location:london|age:23"
>>> re.findall(r, st)
[('name', 'jones'), ('location', 'london'), ('age', '23')]
Upvotes: 1
Reputation: 30736
This situation seems like it calls for re.split
.
>>> s = "full name:jones hardy|city and dialling " \
... "code :london 0044|age:23 years"
>>> [tuple(re.split('\s*:\s*', t))
... for t in re.split('\s*\|\s*', s)]
[('full name', 'jones hardy'),
('city and dialling code', 'london 0044'),
('age', '23 years')]
Upvotes: 2