Reputation: 19811
I am trying to get all the digits from following string after the word classes
(or its variations)
Accepted for all the goods and services in classes 16 and 41.
expected output:
16
41
I have multiple strings which follows this pattern and some others such as:
classes 5 et 30 # expected output 5, 30
class(es) 32,33 # expected output 32, 33
class 16 # expected output 5
Here is what I have tried so far: https://regex101.com/r/eU7dF6/3
(class[\(es\)]*)([and|et|,|\s]*(\d{1,}))+
But I am able to get only the last matched digit i.e. 41
in the above example.
Upvotes: 2
Views: 654
Reputation: 626738
I suggest grabbing all the substring with numbers after class
or classes
/class(es)
and then get all the numbers from those:
import re
p = re.compile(r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*\d+)+')
test_str = "Accepted for all the goods and services in classes 16 and 41."
results = [re.findall(r"\d+", x) for x in p.findall(test_str)]
print([x for l in results for x in l])
# => ['16', '41']
See IDEONE demo
As \G
construct is not supported, nor can you access the captures stack using Python re
module, you cannot use your approach.
However, you can do it the way you did with PyPi regex module.
>>> import regex
>>> test_str = "Accepted for all the goods and services in classes 16 and 41."
>>> rx = r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*(?P<num>\d+))+'
>>> res = []
>>> for x in regex.finditer(rx, test_str):
res.extend(x.captures("num"))
>>> print res
['16', '41']
Upvotes: 1
Reputation: 67968
You can do it in 2 steps.Regex engine remebers only the last group in continous groups.
x="""Accepted for all the goods and services in classes 16 and 41."""
print re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0])
Output:['16', '41']
If you dont want string
use
print map(ast.literal_eval,re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0]))
Output:[16, 41]
If you have to do it in one regex use regex
module
import regex
x="""Accepted for all the goods and services in classes 16 and 41."""
print [ast.literal_eval(i) for i in regex.findall(r"class[\(es\)]*|\G(?:and|et|,|\s)*(\d+)",x,regex.VERSION1) if i]
Output:[16, 41]
Upvotes: 1