AKS
AKS

Reputation: 19811

RegEx: Find all digits after certain string

I am trying to get all the digits from following string after the word classes (or its variations)

Accepted for all the goods and services in classes 16 and 41.

expected output:

16
41

I have multiple strings which follows this pattern and some others such as:

classes 5 et 30 # expected output 5, 30
class(es) 32,33 # expected output 32, 33
class 16        # expected output 5

Here is what I have tried so far: https://regex101.com/r/eU7dF6/3

(class[\(es\)]*)([and|et|,|\s]*(\d{1,}))+

But I am able to get only the last matched digit i.e. 41 in the above example.

Upvotes: 2

Views: 654

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

I suggest grabbing all the substring with numbers after class or classes/class(es) and then get all the numbers from those:

import re
p = re.compile(r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*\d+)+')
test_str = "Accepted for all the goods and services in classes 16 and 41."
results = [re.findall(r"\d+", x) for x in p.findall(test_str)]
print([x for l in results for x in l])
# => ['16', '41']

See IDEONE demo

As \G construct is not supported, nor can you access the captures stack using Python re module, you cannot use your approach.

However, you can do it the way you did with PyPi regex module.

>>> import regex
>>> test_str = "Accepted for all the goods and services in classes 16 and 41."
>>> rx = r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*(?P<num>\d+))+'
>>> res = []
>>> for x in regex.finditer(rx, test_str):
        res.extend(x.captures("num"))
>>> print res
['16', '41']

Upvotes: 1

vks
vks

Reputation: 67968

You can do it in 2 steps.Regex engine remebers only the last group in continous groups.

x="""Accepted for all the goods and services in classes 16 and 41."""
print re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0])

Output:['16', '41']

If you dont want string use

print map(ast.literal_eval,re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0]))

Output:[16, 41]

If you have to do it in one regex use regex module

import regex
x="""Accepted for all the goods and services in classes 16 and 41."""
print [ast.literal_eval(i) for i in regex.findall(r"class[\(es\)]*|\G(?:and|et|,|\s)*(\d+)",x,regex.VERSION1) if i]

Output:[16, 41]

Upvotes: 1

Related Questions