thkang
thkang

Reputation: 11543

python regex simple help - dealing with parentheses

I have bunch of data in this form

xxx(xx.xx) - number of digits are not fixed. like

312(21.1)   378(25.5)   374(25.3)   157(10.6)   260(17.6)   1481(100)
125(28.1)   91(20.4)    94(21.1)    52(11.7)    83(18.7)    445(100)
50(28.4)    44(25)  29(16.5)    12(6.8) 41(23.3)    176(100)

note that they are all unicode strings. lets say the number left to parentheses are A, and the number in the parentheses is B. so A(B).

what I want to do is create a function that returns list [A,B].

I know they can be done with RegExs, but I'm not really good at them. Anyway I did some search and followed tutorials, so I came up with :

re.search('\(.*?\)',s) # for B
re.search('.?\(',s) # for A

Problem is, they return parentheses with the numbers. like

>>s
u'312(21.1)'
>>m=re.search('\(.*?\)',s)
>>m.group()
(21.1)

any help would be appreciated...

Upvotes: 2

Views: 133

Answers (4)

Allan Deamon
Allan Deamon

Reputation: 487

Alternatively, you can use just strings operations, which is simpler and faster than RE in many cases.

>>> s = "312(21.1)   378(25.5)   374(25.3)   157(10.6)   260(17.6)   1481(100) 125(28.1)   91(20.4)    94(21.1)    52(11.7)    83(18.7)    445(100) 50(28.4)    44(25)  29(16.5)    12(6.8) 41(23.3)    176(100)"

splits its into tokens

>>> tokens = s.split()
>>> tokens
['312(21.1)', '378(25.5)', '374(25.3)', '157(10.6)', '260(17.6)', '1481(100)', '125(28.1)', '91(20.4)', '94(21.1)', '52(11.7)', '83(18.7)', '445(100)', '50(28.4)', '44(25)', '29(16.5)', '12(6.8)', '41(23.3)', '176(100)']

removes the ')' in the end

>>> intermediary1 = [ entry[:-1] for entry in tokens ]
>>> intermediary1
['312(21.1', '378(25.5', '374(25.3', '157(10.6', '260(17.6', '1481(100', '125(28.1', '91(20.4', '94(21.1', '52(11.7', '83(18.7', '445(100', '50(28.4', '44(25', '29(16.5', '12(6.8', '41(23.3', '176(100']

breaks into 2 strings

>>> intermediary2 = [ entry.split('(') for entry in intermediary1 ]
>>> intermediary2
[['312', '21.1'], ['378', '25.5'], ['374', '25.3'], ['157', '10.6'], ['260', '17.6'], ['1481', '100'], ['125', '28.1'], ['91', '20.4'], ['94', '21.1'], ['52', '11.7'], ['83', '18.7'], ['445', '100'], ['50', '28.4'], ['44', '25'], ['29', '16.5'], ['12', '6.8'], ['41', '23.3'], ['176', '100']]

convert to numbers (integer, float)

>>> numbers = [ ( int(num1), float(num2) ) for num1, num2 in intermediary2 ]
>>> numbers
[(312, 21.1), (378, 25.5), (374, 25.3), (157, 10.6), (260, 17.6), (1481, 100.0), (125, 28.1), (91, 20.4), (94, 21.1), (52, 11.7), (83, 18.7), (445, 100.0), (50, 28.4), (44, 25.0), (29, 16.5), (12, 6.8), (41, 23.3), (176, 100.0)]

or in a shorter way using list comprehension:

>>> tokens = [ entry[:-1].split('(') for entry in s.split()]
>>> numbers = [ ( int(num1), float(num2) ) for num1, num2 in tokens ]
>>> numbers
[(312, 21.1), (378, 25.5), (374, 25.3), (157, 10.6), (260, 17.6), (1481, 100.0), (125, 28.1), (91, 20.4), (94, 21.1), (52, 11.7), (83, 18.7), (445, 100.0), (50, 28.4), (44, 25.0), (29, 16.5), (12, 6.8), (41, 23.3), (176, 100.0)]

Upvotes: 1

varagrawal
varagrawal

Reputation: 3032

I guess this should help better:

m = re.findall('([0-9]+\.[0-9]+|[0-9]+)', s)

What I have done is made use of the decimal point in the string. I look for a regex that has one or more digits in the range 0-9, then a decimal point and then again one or more digits in the range 0-9, and it alsoe checks for a string with digits 0-9 as an alternative. It then groups the matched expression.

Your solution gives the parentheses because you are asking the regular expression to match the parentheses in the string as well.

This returns the two numbers as a python list stored in m.

Hope it solves your problem. :)

Upvotes: 1

Marc Cohen
Marc Cohen

Reputation: 3808

import re
import sys

li = []
while True:
  line = sys.stdin.readline()
  if not line: break
  for i in line.split():
    m = re.search('(.*)\((.*)\)', i)
    tup = (m.group(1), m.group(2))
    li.append(tup)

print li

sample output:

$ python y
312(21.1)   378(25.5)   374(25.3)   157(10.6)   260(17.6)   1481(100)
[('312', '21.1'), ('378', '25.5'), ('374', '25.3'), ('157', '10.6'), ('260', '17.6'), ('1481', '100')]

Upvotes: 0

rkhayrov
rkhayrov

Reputation: 10260

Use unescaped parenthesis to define groups:

>>> [g[:2] for g in re.findall(r'([0-9]+)\(([0-9]+|[0-9]+\.[0-9]+)\)', s)]
[('312', '21.1'), ('378', '25.5'), ('374', '25.3'), ('157', '10.6'),
('260', '17.6'), ('125', '28.1'), ('91', '20.4'), ('94', '21.1'),
('52', '11.7'), ('83', '18.7'), ('50', '28.4'), ('29', '16.5'), ('12', '6.8'),
('41', '23.3'), ('176', '100')]

Upvotes: 2

Related Questions