Holy Mackerel
Holy Mackerel

Reputation: 3279

Python regexp to capture numbers and dashes separated by spaces and comma

I need to capture tokens such as 11, 12- -13 and 14-15

I wish to reject any strings that contain invalid tokens that are not specified above such as 12-- and 4-5-6 These can be separated by any number of spaces which may or may not include a single coma. So for the string:

43,5 67- -66,53-53 , 6

I wish to return

('43', '5', '67-', '-66', '53-53', '6')

This is what I have tried:

import re

num = r'\d{1,4}'
token = r'(?:-%s)|(?:%s-%s)|(?:%s-)|(?:%s)' % (num, num, num, num, num)
sep = r'\s*,?\s*'
valid = r'(%s)(?:%s(%s))*' % (token, sep, token)

test = re.compile(valid)
m = test.match("43,5 67-  -66,53-53 , 6")
print(m.groups())

but it prints only the first and last numbers:

('43', '6')

Any help is greatly appreciated.

Upvotes: 1

Views: 254

Answers (1)

falsetru
falsetru

Reputation: 368954

Use re.findall:

>>> re.findall(r'[-\d]+', '43,5 67- -66,53-53 , 6')
['43', '5', '67-', '-66', '53-53', '6']

UPDATE

Use negative lookaround assertions to exclude invalid matches.

>>> pattern = r'(?<![-\d])(\d+-\d+|-\d+|\d+-|\d+)(?![-\d])'
>>> re.findall(pattern, '43,5 67- -66,53-53 , 1--, 2, --3, -4-')
['43', '5', '67-', '-66', '53-53', '2']

Upvotes: 5

Related Questions