Reputation: 103
I would like to as a basic Python regular expression problem. I have a dataset
line = "(1,2) (2,3)"
That can repeat many times so line can also be
line = "(1,2) (3,4) (6,5)"
I have a regular expression
rx = "(\(\s*\d+\s*,\s*\d+\s*\)\s*){2,}$"
I want
a = re.match(rx,line).groups();
to match
('(1,2)','(3,4)'...)
But I can only match the last (6,5). I need the last $ because I don't know how many bracketed inputs I can have, otherwise an incorrect input such as
(1,2),(3,4),(5,6
will pass the regexp.
any tips?
Edit: Added the fact that the data was not exactly formatted as detailed. Instead
line= 'blah(1,2) (2,3)blah'
So indeed regular expressions are needed
Thanks
Upvotes: 3
Views: 163
Reputation: 6148
Note that Borgleader's answer leads to:
>>> re.findall(r'[\(\d+,\d+\)]{1,}', '(1, 2),(2,3)')
['(1,', '2),(2,3)']
Joran Beasley's answer for the above case gives:
re.findall(r"(\([^)]*\))", '(1, 2),(2,3)')
['(1, 2)', '(2,3)']
But is too inclusive:
>>> re.findall(r"(\([^)]*\))", '(1, blah2),(2,3)')
['(1, blah2)', '(2,3)']
If you wish to include only numbers, then:
>>> re.findall(r'\(\s*\d+\s*,\s*\d+\s*\)', '(1, 2),(2,3)')
['(1, 2)', '(2,3)']
>>> re.findall(r'\(\s*\d+\s*,\s*\d+\s*\)', '(1, blah2),(2,3)')
['(2,3)']
>>> re.findall(r'\(\s*\d+\s*,\s*\d+\s*\)', '(1, 2),(2,3) (6, 5')
['(1, 2)', '(2,3)']
If you want to remove any spaces in the final result:
>>> [x.replace(' ', '') for x in re.findall(r'\(\s*\d+\s*,\s*\d+\s*\)', '(1, 2),(2,3) (6, 5')]
['(1,2)', '(2,3)']
Or if there are tabs and such:
>>> sp = re.compile('\s')
>>> [sp.sub('', x) for x in re.findall(r'\(\s*\d+\s*,\s*\d+\s*\)', '(1, 2),( 2, 3 ) (6, 5')]
['(1,2)', '(2,3)']
Of course, the simplest for your data set the better.
Upvotes: 0
Reputation: 15916
If you really want to use regular expressions (I'm not a regex specialist, but it worked with the given data):
r = "[\(\d+,\d+\)]{1,}"
c = re.findall(r,line)
or else follow nightcracker's excellent suggestion. Most often the simplest answer is the better answer.
EDIT: Thanks to Joran Beasley for the suggestion.
Upvotes: 4
Reputation: 117951
Behold, the magic of no regular expressions:
>>> "(1,2) (3,4) (6,5)".split()
['(1,2)', '(3,4)', '(6,5)']
Upvotes: 6