Reputation: 7759
I have this string
circle,4.5
square,3.1
circle,2.0
triangle,4.7,4.9
square,4.1
circle,4.3
Lets say I want to capture The name of the shape and the two numbers next to it. I've tried this and will comment about the issue i have inside it:
>>> ma = re.search(r"(\w+)[,(\d+.\d+)]+", "Triangle,3.4,1.2")
>>> ma.group()
'Triangle,3.4,1.2'
>>> ma.group(1)
'Triangle'
>>> ma.group(2) ##Why is this happening ???
Traceback (most recent call last):
File "<pyshell#29>", line 1, in <module>
ma.group(2)
IndexError: no such group
I guess i can't put capturing groups inside square brackets ?
Upvotes: 2
Views: 322
Reputation: 43683
Using .split(',')
is the most easy way, however, if you want to use regex, then you should use
ma = re.search(r"^([^,]+),([^,]+)(?:,([^,]+))?", "Triangle,3.4,1.2")
Upvotes: 1
Reputation: 1124288
Square brackets are special; they mark all characters inside of them as a character group. You are asking to match either a number (\d
), a ,
comma, a .
full stop, a (
opening parenthesis or a )
closing parenthesis. In other words, the opening and closing parenthesis are part of the matched characters, not denoting a capturing group.
You don't need to use a character class at all here, you are looking for a more specific pattern of number, follewed by a full stop followed by another number. Use a non-capturing group ((?:...)
) to group the number format together with the comma to match repeating groups of numbers:
r"(\w+)(?:,(\d+.\d+))+"
Unfortunately, this still won't capture more than one group for you; regular expressions will never produce a variable number of groups. We've defined only two group here, so that's all we get:
>>> import re
>>> ma = re.search(r"(\w+)(?:,(\d+.\d+))+", "Triangle,3.4,1.2")
>>> ma.groups()
('Triangle', '1.2')
See Regex question about parsing method signature and python regex repetition with capture question for other SO questions that ran into this limitation.
Your format is actually very simple, and you'd be much better off not using regular expressions at all. Simply split by the ,
comma and be done with it:
>>> "Triangle,3.4,1.2".split(',')
['Triangle', '3.4', '1.2']
Upvotes: 1
Reputation: 213351
Square brackets have special meaning. They are meant to create Character class
.. So, if you put capture groups
inside square brackets, it means, match group1, or group2, or group3.. It will not match all the groups in continuity.. You would have to use another capture group
in place of square bracket..
But, you can simply use split for this case: -
>>> str = "Triangle,3.4,1.2"
>>> str.split(",")
['Triangle', '3.4', '1.2']
>>>
>>> str = "circle,4.5"
>>> str.split(",")
['circle', '4.5']
>>> str.split(",")[0]
'circle'
>>> str.split(",")[1]
'4.5'
As per your regex: -
ma = re.search(r"(\w+)[,(\d+.\d+)]+", "Triangle,3.4,1.2")
You are using character set [,(\d+.\d+)]
, which matches - ,
or (\d+.\d+)
..
You should change it to: -
ma = re.search(r"(\w+)(,(\d+.\d+))+", "Triangle,3.4,1.2")
But there is a problem in this case: -
You have created only 3 groups: -
group 0 -> complete match
group 1 -> `,1.2` (Outer bracket)
group 2 -> `1.2` (Inner Bracket)
You will not get 3.4
because (,(\d+.\d+))
first matches - ,3.4
and then ,1.2
.. So, it only remembers ,1.2
..
Upvotes: 1