Reputation: 101
I am interested in extracting a number that appears after a set of characters ('AA='
). However, the issue is: (i) I am not aware how long the number is, (ii) I don't know what appears right after the number (could be a blank space or ANY character except 0-9, consider that I do not know what these characters could be but they are definitely not 0-9), (iii) number can be present in exponential form (line 4/5 below)
Given below are few of many inputs that I can have.
Line 1: 123 NUBA AA=1.2345 $BB=1234.55
Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55
Line 3: 123 NUBA RRNJH AA=1.2#ALPHA
Line 4: 123 NUBA ABCD AA=1.2E-5 GBRO
Line 5: 123 NUBA ABCD AA=1.245E-7$ MN
...
The result should be: 1.2345
1.2345678
1.2
1.2e-5
1.245e-7
for each respective line above.
PS: I know how to use .find
and get the starting location of AA=
but that is not very helpful for the above conditions. Also, I understand one way could be to loop through each character after after AA=
and break if a blank space or anything except [0-9
,.
, E
, -
] is seen, but that is clumsy and takes unnecessary space in my code. I am looking for a more neat way of doing this.
Upvotes: 0
Views: 265
Reputation: 163362
You could use a single pattern with a capture group. Use re.findall for example to get the value of the capture group only.
\bAA=(\d+(?:\.\d+)?(?:[eE][-+]?[0-9]+)?)
Explanation
\bAA=
A word boundary, then match AA=
(
Capture group 1
\d+
Match 1+ digits(?:\.\d+)?
Match an optional decimal part(?:[eE][-+]?[0-9]+)?
Match an optional exponential part)
Close group 1import re
regex = r"\bAA=(\d+(?:\.\d+)?(?:[eE][-+]?[0-9]+)?)"
s = ("Line 1: 123 NUBA AA=1.2345 $BB=1234.55\n"
"Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55\n"
"Line 3: 123 NUBA RRNJH AA=1.2#ALPHA\n"
"Line 4: 123 NUBA ABCD AA=1.2E-5 GBRO\n"
"Line 5: 123 NUBA ABCD AA=1.245E-7$ MN")
print(re.findall(regex, s))
Output
['1.2345', '1.2345678', '1.2', '1.2E-5', '1.245E-7']
Upvotes: 2
Reputation: 1817
This will give you the output you want
import re
string1 = '123 NUBA AA=1.2345 $BB=1234.55'
string2 = '123 NUBA MM AA=1.2345678&BB=1234.55'
string3 = '123 NUBA RRNJH AA=1.2#ALPHA'
re.findall(r'\d+\.*\d*', string1[string1.find("AA="):])[0]
re.findall(r'\d+\.*\d*', string2[string2.find("AA="):])[0]
re.findall(r'\d+\.*\d*', string3[string3.find("AA="):])[0]
Output
1.2345
1.2345678
1.2
Upvotes: 1