siamii
siamii

Reputation: 24104

regex matching multiple repeating groups

I have the following string:

s = " 3434 garbage workorders: 138 waiting, 2 running, 3 failed, 134 completed"

I would like to parse the statuses and counts after "workorders". I've tried the following regex:

r = r"workorders:( (\d+) (\w+),?)*"

but this only returns the last group. How can I return all groups?

p.s. I know I could do this in python, but was wondering if there's a pure regex solution

>>> s = " 3434 garbage workorders: 138 waiting, 2 running, 3 failed, 134 completed"
>>> r = r"workorders:( (\d+) (\w+),?)*"
>>> re.findall(r, s)
[(' 134 completed', '134', 'completed')]
>>> 

output should be close to

[('138', 'waiting'), ('2', 'running'), ('3', 'failed'), ('134', 'completed')]

Upvotes: 2

Views: 358

Answers (5)

mclslee
mclslee

Reputation: 498

In my experience, I found it better to use regex after you process the string as much as possible; regex on an arbitrary string will only cause headaches.

In your case, try splitting on ':' (or even workorders:) and getting the stuff after to get only the counts of statuses. After that, it's easy to get the counts for each status.

s = " 3434 garbage workorders: 138 waiting, 2 running, 3 failed, 134 
      completed"
statuses = s.split(':') #['3434 garbage workorders', ' 138 waiting, 2 running, 3 failed, 134 completed']
statusesStr = ''.join(statuses[1]) # ' 138 waiting, 2 running, 3 failed, 134 completed'

statusRe = re.compile("(\d+)\s*(\w+)")
statusRe.findall(statusesStr) #[('138', 'waiting'), ('2', 'running'), ('3', 'failed'), ('134', 'completed')]

Edit: changed expression to meet desired outcome and more robust

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163352

For the text in the example, you could try it like this:

(?:(\d+) (\w+)(?=,|$))+

Explanation

  • A non capturing group (?:
  • A capturing group for one or more digits (\d+)
  • A white space
  • A capturing group for one or more word characters (\w+)
  • A positive lookhead which asserts that what follows is either a comma or the end of the string (?=,|$)
  • Close the non capturing group and repeat that one or more times )+

Demo

That would give you:

[('138', 'waiting'), ('2', 'running'), ('3', 'failed'), ('134', 'completed')]

Upvotes: 2

simsosims
simsosims

Reputation: 285

This will give you your output exactly.

map = re.findall(r'(\d+) ([A-Za-z]+)', s.split("workorders:")[1])

You can then bust this init.

x = {v: int(k) for k, v in map}

Upvotes: 0

Jt Miclat
Jt Miclat

Reputation: 144

Answer that will only look at regex that are after :

 re.findall(r'(?: )\d+ \w+')

Upvotes: 0

eLRuLL
eLRuLL

Reputation: 18799

this should work for your particular case:

re.findall('[:,] (\d+)', s)

Upvotes: 1

Related Questions