python re match groups

Question

I want to extract some fields from string, however I am not sure how many are they. I used regexp however, there are some problems which I do not understand.

for example:

  199  -> (199)
  199,200  -> (199,200)
  300,20,500 -> (300,20, 500)

I tried it, however somewhat I can not get this to work. Hope anyone can give me some advises. I will appreciate.

the regex I tried:

>>> re.match('^(\d+,)*(\d+)$', '20,59,199,300').groups()
('199,', '300')
// in this, I do not really care about ',' since I could use .strip(',') to trim that.

I did some google: and tried to use re.findall, but I am not sure how do I get this:

>>> re.findall('^(\d+,)*(\d+)$', '20,59,199,300')
[('199,', '300')]

------------------------------------------------------update

I realize without telling the whole story, this question can be confusing. basically I want to validate syntax that defined in crontab (or similar)

I create a array for _VALID_EXPRESSION: it is a nested tuples.

 (field_1,
  field_2,
 )

for each field_1, it has two tuples,

 field_1:   ((0,59),        (r'....', r'....'))
            valid_value   valid_format

in my code, it looks like this:

_VALID_EXPRESSION =  \
 12     (((0, 59), (r'^\*$', r'^\*/(\d+)$', r'^(\d+)-(\d+)$',
 13                 r'^(\d+)-(\d+)/(\d+)$', r'^(\d+,)*(\d+)$')),   # second
 14      ((0, 59), (r'^\*$', r'^\*\/(\d+)$', r'^(\d+)-(\d+)$',
 15                 r'^(\d+)-(\d+)/(\d+)$', r'^(\d+,)*(\d+)$')),   # minute
 16        .... )

in my parse function, all I have to do is just extract all the groups and see if they are within the valid value.

one of regexp I need is that it is able to correctly match this string '50,200,300' and extract all the numbers in this case. (I could use split() of course, however, it will betray my original intention. so, I dislike that idea. )

Hope this will be helpful.

abarnert · Accepted Answer

The simplest solution with a regex is this:

r"(\d+,?)"

You can use findall to get the 300,, 20,, and 500 that you want. Or, if you don't want the commas:

r"(\d+),?"

This matches a group of 1 or more digits, followed by 0 or 1 commas (not in the group).

Either way:

>>> s = '300,20,500'
>>> r = re.compile(r"(\d+),?")
>>> r.findall(s)
['300', '20', '500']

However, as Sahil Grover points out, if those are your input strings, this is equivalent to just calling s.split(','). If your input strings might have non-digits, then this will ensure you only match digit strings, but even that would probably be simpler as filter(str.isdigit, s.split(',')).

If you want a tuple of ints instead of a list of strs:

>>> tuple(map(int, r.findall(s)))
(300, 20, 500)

If you find comprehensions/generator expressions easier to read than map/filter calls:

>>> tuple(int(x) for x in r.findall(s))
(300, 20, 500)

Or, more simply:

>>> tuple(int(x) for x in s.split(',') if x.isdigit())
(300, 20, 500)

And if you want the string (300, 20, 500), while you can of course do that by just calling repr on the tuple, there's a much easier way to get that:

>>> '(' + s + ')'
'(300, 20, 500)'

Your original regex:

'^(\d+,)*(\d+)$'

… is going to return exactly two groups, because you have exactly two groups in the pattern. And, since you're explicitly wrapping it in ^ and $, it has to match the entire string, so findall isn't going to help you here—it's going to find the exact same one match (of two groups) as match.

python re match groups

Answers (2)

Related Questions