Reputation: 567
I need to extract the real issue number in my file name. There are 2 patterns:
asdasd 213.pdf ---> 213
abcd123efg456.pdf ---> 123
123abcd 4567sdds.pdf ---> 4567, since 123 is ignored
890abcd 123efg456.pdf ---> 123, since 890 is ignored
I want to learn whether it is possilbe to write only one regular expression to implement it? Currently, my soluton involves 2 steps:
or in Python code
import re
reNumHeading = re.compile('^\d{1,}', re.IGNORECASE | re.VERBOSE) # to find leading number
reNum = re.compile('\d{1,}', re.IGNORECASE | re.VERBOSE) # to find number
lstTest = '''123abcd 4567sdds.pdf
asdasd 213.pdf
abcd 123efg456.pdf
890abcd 123efg456.pdf'''.split('\n')
for test in lstTest:
if reNumHeading.match(test):
span = reNumHeading.match(test).span()
stripTest = test[span[1]:]
else:
stripTest = test
result = reNum.findall(stripTest)
if result:
print(result[0])
thanks
Upvotes: 2
Views: 510
Reputation: 91430
Just match digits \d+
that follow a non-digit \D
:
import re
lstTest = '''123abcd 4567sdds.pdf
asdasd 213.pdf
abcd 123efg456.pdf
890abcd 123efg456.pdf'''.split('\n')
for test in lstTest:
res = re.search(r'\D(\d+)', test)
print(res.group(1))
Output:
4567
213
123
123
Upvotes: 3
Reputation: 23667
You can use ?
quantifier to define optional pattern
>>> import re
>>> s = '''asdasd 213.pdf
... abcd123efg456.pdf
... 123abcd 4567sdds.pdf
... 890abcd 123efg456.pdf'''
>>> for line in s.split('\n'):
... print(re.search(r'(?:^\d+)?.*?(\d+)', line)[1])
...
213
123
4567
123
(?:^\d+)?
here a non-capturing group and ?
quantifier is used to optionally match digits at start of line
+
is greedy, all the starting digits will be matched.*?
match any number of characters minimally (because we need the first match of digits)(\d+)
the required digits to be capturedre.search
returns a re.Match
object from which you can get various details[1]
on the re.Match
object will give you string captured by first capturing group
.group(1)
if you are on older version of Python that doesn't support [1]
syntaxSee also: Reference - What does this regex mean?
Upvotes: 3