Reputation: 3279
I need to capture tokens such as 11
, 12-
-13
and 14-15
I wish to reject any strings that contain invalid tokens that are not specified above such as 12--
and 4-5-6
These can be separated by any number of spaces which may or may not include a single coma. So for the string:
43,5 67- -66,53-53 , 6
I wish to return
('43', '5', '67-', '-66', '53-53', '6')
This is what I have tried:
import re
num = r'\d{1,4}'
token = r'(?:-%s)|(?:%s-%s)|(?:%s-)|(?:%s)' % (num, num, num, num, num)
sep = r'\s*,?\s*'
valid = r'(%s)(?:%s(%s))*' % (token, sep, token)
test = re.compile(valid)
m = test.match("43,5 67- -66,53-53 , 6")
print(m.groups())
but it prints only the first and last numbers:
('43', '6')
Any help is greatly appreciated.
Upvotes: 1
Views: 254
Reputation: 368954
Use re.findall
:
>>> re.findall(r'[-\d]+', '43,5 67- -66,53-53 , 6')
['43', '5', '67-', '-66', '53-53', '6']
UPDATE
Use negative lookaround assertions to exclude invalid matches.
>>> pattern = r'(?<![-\d])(\d+-\d+|-\d+|\d+-|\d+)(?![-\d])'
>>> re.findall(pattern, '43,5 67- -66,53-53 , 1--, 2, --3, -4-')
['43', '5', '67-', '-66', '53-53', '2']
Upvotes: 5