Reputation: 33
Stuck with the following issue:
I have a string 'ABC.123.456XX' and I want to use regex to extract the 3 numeric characters that come after the second period. Really struggling with this and would appreciate any new insights, this is the closest I got but its not really close to what I want:
'.*\.(.*?\.\d{3})'
I appreciate any help in advance - thanks.
Upvotes: 3
Views: 1362
Reputation:
Dot, not-Dot twice then the 3 digits follow in capture group 1
[^.]*(?:\.[^.]*){2}(\d{3})
https://regex101.com/r/qWpfHx/1
Expanded
[^.]*
(?: \. [^.]* ){2}
( \d{3} ) # (1)
Upvotes: 1
Reputation: 2783
If your input will always be in a similar format, like xxx.xxx.xxxxx
, then one solution is string manipulation:
>>> s = 'ABC.123.456XX'
>>> '.'.join(s.split('.')[2:])[0:3]
Explanation
In the line '.'.join(s.split('.')[2:])[0:3]
:
s.split('.')
splits the string into the list ['ABC', '123', '456XX']
'.'.join(s.split('.')[2:])
joins the remainder of the list after the second element, so '456XX'
[0:3]
selects the substring from index 0 to index 2 (inclusive), so the result is 456
Upvotes: 2
Reputation: 27743
This expression might also work just OK:
[^\r\n.]+\.[^\r\n.]+\.([0-9]{3})
import re
regex = r'[^\r\n.]+\.[^\r\n.]+\.([0-9]{3})'
string = '''
ABC.123.456XX
ABCOUOU.123123123.000871XX
ABCanything_else.123123123.111871XX
'''
print(re.findall(regex, string))
['456', '000', '111']
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Upvotes: 1