user8675309
user8675309

Reputation: 181

Grabbing multiple patterns in a string using regex

In python I'm trying to grab multiple inputs from string using regular expression; however, I'm having trouble. For the string:

inputs       =    12 1  345 543 2

I tried using:

match = re.match(r'\s*inputs\s*=(\s*\d+)+',string)

However, this only returns the value '2'. I'm trying to capture all the values '12','1','345','543','2' but not sure how to do this.

Any help is greatly appreciated!

EDIT: Thank you all for explaining why this is does not work and providing alternative suggestions. Sorry if this is a repeat question.

Upvotes: 0

Views: 165

Answers (4)

Inbar Rose
Inbar Rose

Reputation: 43507

You can embed your regular expression:

import re
s = 'inputs       =    12 1  345 543 2'
print re.findall(r'(\d+)', re.match(r'inputs\s*=\s*([\s\d]+)', s).group(1))
>>> 
['12', '1', '345', '543', '2']

Or do it in layers:

import re

def get_inputs(s, regex=r'inputs\s*=\s*([\s\d]+)'):
    match = re.match(regex, s)
    if not match:
        return False # or raise an exception - whatever you want
    else:
        return re.findall(r'(\d+)', match.group(1))

s = 'inputs       =    12 1  345 543 2'
print get_inputs(s)
>>> 
['12', '1', '345', '543', '2']

Upvotes: 1

Martin Ender
Martin Ender

Reputation: 44289

You cannot do this with a single regex (unless you were using .NET), because each capturing group will only ever return one result even if it is repeated (the last one in the case of Python).

Since variable length lookbehinds are also not possible (in which case you could do (?<=inputs.*=.*)\d+), you will have to separate this into two steps:

match = re.match(r'\s*inputs\s*=\s*(\d+(?:\s*\d+)+)', string)
integers = re.split(r'\s+',match.group(1))

So now you capture the entire list of integers (and the spaces between them), and then you split that capture at the spaces.

The second step could also be done using findall:

integers = re.findall(r'\d+',match.group(1))

The results are identical.

Upvotes: 1

Lllama
Lllama

Reputation: 372

You should look at this answer: https://stackoverflow.com/a/4651893/1129561

In short:

In Python, this isn’t possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).

Upvotes: 0

mohit6up
mohit6up

Reputation: 4348

You could try something like: re.findall("\d+", your_string).

Upvotes: 2

Related Questions