nuki
nuki

Reputation: 101

Extract continuous numeric characters from a string in Python

I am interested in extracting a number that appears after a set of characters ('AA='). However, the issue is: (i) I am not aware how long the number is, (ii) I don't know what appears right after the number (could be a blank space or ANY character except 0-9, consider that I do not know what these characters could be but they are definitely not 0-9), (iii) number can be present in exponential form (line 4/5 below)

Given below are few of many inputs that I can have.

Line 1: 123 NUBA AA=1.2345 $BB=1234.55
Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55
Line 3: 123 NUBA RRNJH AA=1.2#ALPHA
Line 4: 123 NUBA ABCD AA=1.2E-5 GBRO
Line 5: 123 NUBA ABCD AA=1.245E-7$ MN
...

The result should be: 1.2345 1.2345678 1.2 1.2e-5 1.245e-7 for each respective line above.

PS: I know how to use .find and get the starting location of AA= but that is not very helpful for the above conditions. Also, I understand one way could be to loop through each character after after AA= and break if a blank space or anything except [0-9,., E, -] is seen, but that is clumsy and takes unnecessary space in my code. I am looking for a more neat way of doing this.

Upvotes: 0

Views: 265

Answers (2)

The fourth bird
The fourth bird

Reputation: 163362

You could use a single pattern with a capture group. Use re.findall for example to get the value of the capture group only.

\bAA=(\d+(?:\.\d+)?(?:[eE][-+]?[0-9]+)?)

Explanation

  • \bAA= A word boundary, then match AA=
  • ( Capture group 1
    • \d+ Match 1+ digits
    • (?:\.\d+)? Match an optional decimal part
    • (?:[eE][-+]?[0-9]+)? Match an optional exponential part
  • ) Close group 1

Regex demo

import re
 
regex = r"\bAA=(\d+(?:\.\d+)?(?:[eE][-+]?[0-9]+)?)"
 
s = ("Line 1: 123 NUBA AA=1.2345 $BB=1234.55\n"
    "Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55\n"
    "Line 3: 123 NUBA RRNJH AA=1.2#ALPHA\n"
    "Line 4: 123 NUBA ABCD AA=1.2E-5 GBRO\n"
    "Line 5: 123 NUBA ABCD AA=1.245E-7$ MN")
 
print(re.findall(regex, s))

Output

['1.2345', '1.2345678', '1.2', '1.2E-5', '1.245E-7']

Python demo

Upvotes: 2

Mitchell Olislagers
Mitchell Olislagers

Reputation: 1817

This will give you the output you want

import re

string1 = '123 NUBA AA=1.2345 $BB=1234.55'
string2 = '123 NUBA MM AA=1.2345678&BB=1234.55'
string3 = '123 NUBA RRNJH AA=1.2#ALPHA'

re.findall(r'\d+\.*\d*', string1[string1.find("AA="):])[0]
re.findall(r'\d+\.*\d*', string2[string2.find("AA="):])[0]
re.findall(r'\d+\.*\d*', string3[string3.find("AA="):])[0]

Output

1.2345
1.2345678
1.2

Upvotes: 1

Related Questions