Reputation: 11543
I have bunch of data in this form
xxx(xx.xx) - number of digits are not fixed. like
312(21.1) 378(25.5) 374(25.3) 157(10.6) 260(17.6) 1481(100)
125(28.1) 91(20.4) 94(21.1) 52(11.7) 83(18.7) 445(100)
50(28.4) 44(25) 29(16.5) 12(6.8) 41(23.3) 176(100)
note that they are all unicode strings. lets say the number left to parentheses are A, and the number in the parentheses is B. so A(B).
what I want to do is create a function that returns list [A,B].
I know they can be done with RegExs, but I'm not really good at them. Anyway I did some search and followed tutorials, so I came up with :
re.search('\(.*?\)',s) # for B
re.search('.?\(',s) # for A
Problem is, they return parentheses with the numbers. like
>>s
u'312(21.1)'
>>m=re.search('\(.*?\)',s)
>>m.group()
(21.1)
any help would be appreciated...
Upvotes: 2
Views: 133
Reputation: 487
Alternatively, you can use just strings operations, which is simpler and faster than RE in many cases.
>>> s = "312(21.1) 378(25.5) 374(25.3) 157(10.6) 260(17.6) 1481(100) 125(28.1) 91(20.4) 94(21.1) 52(11.7) 83(18.7) 445(100) 50(28.4) 44(25) 29(16.5) 12(6.8) 41(23.3) 176(100)"
splits its into tokens
>>> tokens = s.split()
>>> tokens
['312(21.1)', '378(25.5)', '374(25.3)', '157(10.6)', '260(17.6)', '1481(100)', '125(28.1)', '91(20.4)', '94(21.1)', '52(11.7)', '83(18.7)', '445(100)', '50(28.4)', '44(25)', '29(16.5)', '12(6.8)', '41(23.3)', '176(100)']
removes the ')' in the end
>>> intermediary1 = [ entry[:-1] for entry in tokens ]
>>> intermediary1
['312(21.1', '378(25.5', '374(25.3', '157(10.6', '260(17.6', '1481(100', '125(28.1', '91(20.4', '94(21.1', '52(11.7', '83(18.7', '445(100', '50(28.4', '44(25', '29(16.5', '12(6.8', '41(23.3', '176(100']
breaks into 2 strings
>>> intermediary2 = [ entry.split('(') for entry in intermediary1 ]
>>> intermediary2
[['312', '21.1'], ['378', '25.5'], ['374', '25.3'], ['157', '10.6'], ['260', '17.6'], ['1481', '100'], ['125', '28.1'], ['91', '20.4'], ['94', '21.1'], ['52', '11.7'], ['83', '18.7'], ['445', '100'], ['50', '28.4'], ['44', '25'], ['29', '16.5'], ['12', '6.8'], ['41', '23.3'], ['176', '100']]
convert to numbers (integer, float)
>>> numbers = [ ( int(num1), float(num2) ) for num1, num2 in intermediary2 ]
>>> numbers
[(312, 21.1), (378, 25.5), (374, 25.3), (157, 10.6), (260, 17.6), (1481, 100.0), (125, 28.1), (91, 20.4), (94, 21.1), (52, 11.7), (83, 18.7), (445, 100.0), (50, 28.4), (44, 25.0), (29, 16.5), (12, 6.8), (41, 23.3), (176, 100.0)]
or in a shorter way using list comprehension:
>>> tokens = [ entry[:-1].split('(') for entry in s.split()]
>>> numbers = [ ( int(num1), float(num2) ) for num1, num2 in tokens ]
>>> numbers
[(312, 21.1), (378, 25.5), (374, 25.3), (157, 10.6), (260, 17.6), (1481, 100.0), (125, 28.1), (91, 20.4), (94, 21.1), (52, 11.7), (83, 18.7), (445, 100.0), (50, 28.4), (44, 25.0), (29, 16.5), (12, 6.8), (41, 23.3), (176, 100.0)]
Upvotes: 1
Reputation: 3032
I guess this should help better:
m = re.findall('([0-9]+\.[0-9]+|[0-9]+)', s)
What I have done is made use of the decimal point in the string. I look for a regex that has one or more digits in the range 0-9, then a decimal point and then again one or more digits in the range 0-9, and it alsoe checks for a string with digits 0-9 as an alternative. It then groups the matched expression.
Your solution gives the parentheses because you are asking the regular expression to match the parentheses in the string as well.
This returns the two numbers as a python list stored in m.
Hope it solves your problem. :)
Upvotes: 1
Reputation: 3808
import re
import sys
li = []
while True:
line = sys.stdin.readline()
if not line: break
for i in line.split():
m = re.search('(.*)\((.*)\)', i)
tup = (m.group(1), m.group(2))
li.append(tup)
print li
sample output:
$ python y
312(21.1) 378(25.5) 374(25.3) 157(10.6) 260(17.6) 1481(100)
[('312', '21.1'), ('378', '25.5'), ('374', '25.3'), ('157', '10.6'), ('260', '17.6'), ('1481', '100')]
Upvotes: 0
Reputation: 10260
Use unescaped parenthesis to define groups:
>>> [g[:2] for g in re.findall(r'([0-9]+)\(([0-9]+|[0-9]+\.[0-9]+)\)', s)]
[('312', '21.1'), ('378', '25.5'), ('374', '25.3'), ('157', '10.6'),
('260', '17.6'), ('125', '28.1'), ('91', '20.4'), ('94', '21.1'),
('52', '11.7'), ('83', '18.7'), ('50', '28.4'), ('29', '16.5'), ('12', '6.8'),
('41', '23.3'), ('176', '100')]
Upvotes: 2