Reputation: 24104
I have the following string:
s = " 3434 garbage workorders: 138 waiting, 2 running, 3 failed, 134 completed"
I would like to parse the statuses and counts after "workorders". I've tried the following regex:
r = r"workorders:( (\d+) (\w+),?)*"
but this only returns the last group. How can I return all groups?
p.s. I know I could do this in python, but was wondering if there's a pure regex solution
>>> s = " 3434 garbage workorders: 138 waiting, 2 running, 3 failed, 134 completed"
>>> r = r"workorders:( (\d+) (\w+),?)*"
>>> re.findall(r, s)
[(' 134 completed', '134', 'completed')]
>>>
output should be close to
[('138', 'waiting'), ('2', 'running'), ('3', 'failed'), ('134', 'completed')]
Upvotes: 2
Views: 358
Reputation: 498
In my experience, I found it better to use regex after you process the string as much as possible; regex on an arbitrary string will only cause headaches.
In your case, try splitting on ':' (or even workorders:) and getting the stuff after to get only the counts of statuses. After that, it's easy to get the counts for each status.
s = " 3434 garbage workorders: 138 waiting, 2 running, 3 failed, 134
completed"
statuses = s.split(':') #['3434 garbage workorders', ' 138 waiting, 2 running, 3 failed, 134 completed']
statusesStr = ''.join(statuses[1]) # ' 138 waiting, 2 running, 3 failed, 134 completed'
statusRe = re.compile("(\d+)\s*(\w+)")
statusRe.findall(statusesStr) #[('138', 'waiting'), ('2', 'running'), ('3', 'failed'), ('134', 'completed')]
Edit: changed expression to meet desired outcome and more robust
Upvotes: 1
Reputation: 163352
For the text in the example, you could try it like this:
Explanation
(?:
(\d+)
(\w+)
(?=,|$)
)+
That would give you:
[('138', 'waiting'), ('2', 'running'), ('3', 'failed'), ('134', 'completed')]
Upvotes: 2
Reputation: 285
This will give you your output exactly.
map = re.findall(r'(\d+) ([A-Za-z]+)', s.split("workorders:")[1])
You can then bust this init.
x = {v: int(k) for k, v in map}
Upvotes: 0
Reputation: 144
Answer that will only look at regex that are after :
re.findall(r'(?: )\d+ \w+')
Upvotes: 0
Reputation: 18799
this should work for your particular case:
re.findall('[:,] (\d+)', s)
Upvotes: 1