Rafael Adel
Rafael Adel

Reputation: 7759

How to capture the values using regular expressions?

I have this string

circle,4.5
square,3.1
circle,2.0
triangle,4.7,4.9
square,4.1
circle,4.3

Lets say I want to capture The name of the shape and the two numbers next to it. I've tried this and will comment about the issue i have inside it:

>>> ma = re.search(r"(\w+)[,(\d+.\d+)]+", "Triangle,3.4,1.2")
>>> ma.group()
'Triangle,3.4,1.2'
>>> ma.group(1)
'Triangle'
>>> ma.group(2)  ##Why is this happening ???
Traceback (most recent call last):
  File "<pyshell#29>", line 1, in <module>
    ma.group(2)
IndexError: no such group

I guess i can't put capturing groups inside square brackets ?

Upvotes: 2

Views: 322

Answers (3)

Ωmega
Ωmega

Reputation: 43683

Using .split(',') is the most easy way, however, if you want to use regex, then you should use

ma = re.search(r"^([^,]+),([^,]+)(?:,([^,]+))?", "Triangle,3.4,1.2") 

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1124288

Square brackets are special; they mark all characters inside of them as a character group. You are asking to match either a number (\d), a , comma, a . full stop, a ( opening parenthesis or a ) closing parenthesis. In other words, the opening and closing parenthesis are part of the matched characters, not denoting a capturing group.

You don't need to use a character class at all here, you are looking for a more specific pattern of number, follewed by a full stop followed by another number. Use a non-capturing group ((?:...)) to group the number format together with the comma to match repeating groups of numbers:

r"(\w+)(?:,(\d+.\d+))+"

Unfortunately, this still won't capture more than one group for you; regular expressions will never produce a variable number of groups. We've defined only two group here, so that's all we get:

>>> import re
>>> ma = re.search(r"(\w+)(?:,(\d+.\d+))+", "Triangle,3.4,1.2")
>>> ma.groups()
('Triangle', '1.2')

See Regex question about parsing method signature and python regex repetition with capture question for other SO questions that ran into this limitation.

Your format is actually very simple, and you'd be much better off not using regular expressions at all. Simply split by the , comma and be done with it:

>>> "Triangle,3.4,1.2".split(',')
['Triangle', '3.4', '1.2']

Upvotes: 1

Rohit Jain
Rohit Jain

Reputation: 213351

Square brackets have special meaning. They are meant to create Character class.. So, if you put capture groups inside square brackets, it means, match group1, or group2, or group3.. It will not match all the groups in continuity.. You would have to use another capture group in place of square bracket..

But, you can simply use split for this case: -

>>> str = "Triangle,3.4,1.2"
>>> str.split(",")
['Triangle', '3.4', '1.2']
>>> 
>>> str = "circle,4.5"
>>> str.split(",")
['circle', '4.5']

>>> str.split(",")[0]
'circle'
>>> str.split(",")[1]
'4.5'

As per your regex: -

ma = re.search(r"(\w+)[,(\d+.\d+)]+", "Triangle,3.4,1.2")

You are using character set [,(\d+.\d+)], which matches - , or (\d+.\d+)..

You should change it to: -

ma = re.search(r"(\w+)(,(\d+.\d+))+", "Triangle,3.4,1.2")

But there is a problem in this case: -

You have created only 3 groups: -

group 0 -> complete match
group 1 -> `,1.2` (Outer bracket)
group 2 -> `1.2`  (Inner Bracket)

You will not get 3.4 because (,(\d+.\d+)) first matches - ,3.4 and then ,1.2.. So, it only remembers ,1.2..

Upvotes: 1

Related Questions