Extract continuous numeric characters from a string in Python

Question

I am interested in extracting a number that appears after a set of characters ('AA='). However, the issue is: (i) I am not aware how long the number is, (ii) I don't know what appears right after the number (could be a blank space or ANY character except 0-9, consider that I do not know what these characters could be but they are definitely not 0-9), (iii) number can be present in exponential form (line 4/5 below)

Given below are few of many inputs that I can have.

Line 1: 123 NUBA AA=1.2345 $BB=1234.55
Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55
Line 3: 123 NUBA RRNJH AA=1.2#ALPHA
Line 4: 123 NUBA ABCD AA=1.2E-5 GBRO
Line 5: 123 NUBA ABCD AA=1.245E-7$ MN
...

The result should be: 1.2345 1.2345678 1.2 1.2e-5 1.245e-7 for each respective line above.

PS: I know how to use .find and get the starting location of AA= but that is not very helpful for the above conditions. Also, I understand one way could be to loop through each character after after AA= and break if a blank space or anything except [0-9,., E, -] is seen, but that is clumsy and takes unnecessary space in my code. I am looking for a more neat way of doing this.

The fourth bird · Accepted Answer

You could use a single pattern with a capture group. Use re.findall for example to get the value of the capture group only.

\bAA=(\d+(?:\.\d+)?(?:[eE][-+]?[0-9]+)?)

Explanation

\bAA= A word boundary, then match AA=
( Capture group 1
- \d+ Match 1+ digits
- (?:\.\d+)? Match an optional decimal part
- (?:[eE][-+]?[0-9]+)? Match an optional exponential part
) Close group 1

Regex demo

import re
 
regex = r"\bAA=(\d+(?:\.\d+)?(?:[eE][-+]?[0-9]+)?)"
 
s = ("Line 1: 123 NUBA AA=1.2345 $BB=1234.55
"
    "Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55
"
    "Line 3: 123 NUBA RRNJH AA=1.2#ALPHA
"
    "Line 4: 123 NUBA ABCD AA=1.2E-5 GBRO
"
    "Line 5: 123 NUBA ABCD AA=1.245E-7$ MN")
 
print(re.findall(regex, s))

Output

['1.2345', '1.2345678', '1.2', '1.2E-5', '1.245E-7']

Python demo

Extract continuous numeric characters from a string in Python

Answers (2)

Related Questions