user1573235
user1573235

Reputation: 103

Python Regular Expression Matching Many Times

I would like to as a basic Python regular expression problem. I have a dataset

line = "(1,2) (2,3)" 

That can repeat many times so line can also be

line = "(1,2) (3,4) (6,5)"

I have a regular expression

rx = "(\(\s*\d+\s*,\s*\d+\s*\)\s*){2,}$"

I want

a = re.match(rx,line).groups();

to match

('(1,2)','(3,4)'...)

But I can only match the last (6,5). I need the last $ because I don't know how many bracketed inputs I can have, otherwise an incorrect input such as

(1,2),(3,4),(5,6

will pass the regexp.

any tips?

Edit: Added the fact that the data was not exactly formatted as detailed. Instead

line= 'blah(1,2) (2,3)blah'

So indeed regular expressions are needed

Thanks

Upvotes: 3

Views: 163

Answers (4)

user650654
user650654

Reputation: 6148

Note that Borgleader's answer leads to:

>>> re.findall(r'[\(\d+,\d+\)]{1,}', '(1, 2),(2,3)')
['(1,', '2),(2,3)']

Joran Beasley's answer for the above case gives:

re.findall(r"(\([^)]*\))", '(1, 2),(2,3)')
['(1, 2)', '(2,3)']

But is too inclusive:

>>> re.findall(r"(\([^)]*\))", '(1, blah2),(2,3)')
['(1, blah2)', '(2,3)']

If you wish to include only numbers, then:

>>> re.findall(r'\(\s*\d+\s*,\s*\d+\s*\)', '(1, 2),(2,3)')
['(1, 2)', '(2,3)']
>>> re.findall(r'\(\s*\d+\s*,\s*\d+\s*\)', '(1, blah2),(2,3)')
['(2,3)']
>>> re.findall(r'\(\s*\d+\s*,\s*\d+\s*\)', '(1, 2),(2,3) (6, 5')
['(1, 2)', '(2,3)']

If you want to remove any spaces in the final result:

>>> [x.replace(' ', '') for x in re.findall(r'\(\s*\d+\s*,\s*\d+\s*\)', '(1, 2),(2,3) (6, 5')]
['(1,2)', '(2,3)']

Or if there are tabs and such:

>>> sp = re.compile('\s')
>>> [sp.sub('', x) for x in re.findall(r'\(\s*\d+\s*,\s*\d+\s*\)', '(1, 2),( 2, 3 ) (6, 5')]
['(1,2)', '(2,3)']

Of course, the simplest for your data set the better.

Upvotes: 0

Borgleader
Borgleader

Reputation: 15916

If you really want to use regular expressions (I'm not a regex specialist, but it worked with the given data):

r = "[\(\d+,\d+\)]{1,}"
c = re.findall(r,line)

or else follow nightcracker's excellent suggestion. Most often the simplest answer is the better answer.

EDIT: Thanks to Joran Beasley for the suggestion.

Upvotes: 4

Igor Serebryany
Igor Serebryany

Reputation: 3341

try using re.findall(rx, line)

Upvotes: 1

orlp
orlp

Reputation: 117951

Behold, the magic of no regular expressions:

>>> "(1,2) (3,4) (6,5)".split()
['(1,2)', '(3,4)', '(6,5)']

Upvotes: 6

Related Questions