spyridon
spyridon

Reputation: 33

Regex match adjacent digits after second occurrence of character

Stuck with the following issue:

I have a string 'ABC.123.456XX' and I want to use regex to extract the 3 numeric characters that come after the second period. Really struggling with this and would appreciate any new insights, this is the closest I got but its not really close to what I want:

'.*\.(.*?\.\d{3})'

I appreciate any help in advance - thanks.

Upvotes: 3

Views: 1362

Answers (3)

user12097764
user12097764

Reputation:

Dot, not-Dot twice then the 3 digits follow in capture group 1

[^.]*(?:\.[^.]*){2}(\d{3})

https://regex101.com/r/qWpfHx/1

Expanded

 [^.]* 
 (?: \. [^.]* ){2}
 ( \d{3} )                     # (1)

Upvotes: 1

wcarhart
wcarhart

Reputation: 2783

If your input will always be in a similar format, like xxx.xxx.xxxxx, then one solution is string manipulation:

>>> s = 'ABC.123.456XX'
>>> '.'.join(s.split('.')[2:])[0:3]

Explanation

In the line '.'.join(s.split('.')[2:])[0:3]:

  • s.split('.') splits the string into the list ['ABC', '123', '456XX']
  • '.'.join(s.split('.')[2:]) joins the remainder of the list after the second element, so '456XX'
  • [0:3] selects the substring from index 0 to index 2 (inclusive), so the result is 456

Upvotes: 2

Emma
Emma

Reputation: 27743

This expression might also work just OK:

[^\r\n.]+\.[^\r\n.]+\.([0-9]{3})

Test

import re

regex = r'[^\r\n.]+\.[^\r\n.]+\.([0-9]{3})'
string = '''
ABC.123.456XX
ABCOUOU.123123123.000871XX
ABCanything_else.123123123.111871XX
'''

print(re.findall(regex, string))

Output

['456', '000', '111']

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Upvotes: 1

Related Questions