Reputation: 65
I'm trying to get the first number (int and float) after a specific pattern:
strings = ["Building 38 House 10",
"Building : 10.5 house 900"]
for x in string:
print(<rule>)
Wanted result:
'38'
'10.5'
I tried:
for x in strings:
print(re.findall(f"(?<=Building).+\d+", x))
print(re.findall(f"(?<=Building).+(\d+.?\d+)", x))
[' 38 House 10']
['10']
[' : 10.5 house 900']
['00']
But I'm missing something.
Upvotes: 2
Views: 505
Reputation: 18490
An idea to use \D
(negated \d
) to match any non-digits in between and capture the number:
Building\D*\b([\d.]+)
See this demo at regex101 or Python demo at tio.run
Just to mention, use word boundaries \b
around Building
to match the full word.
Upvotes: 1
Reputation: 163352
You could use a capture group:
\bBuilding[\s:]+(\d+(?:\.\d+)?)\b
Explanation
\bBuilding
Match the word Building
[\s:]+
Match 1+ whitespace chars or colons(\d+(?:\.\d+)?)
Capture group 1, match 1+ digits with an optional decimal part\b
A word boundaryimport re
strings = ["Building 38 House 10",
"Building : 10.5 house 900"]
pattern = r"\bBuilding[\s:]+(\d+(?:\.\d+)?)"
for x in strings:
m = re.search(pattern, x)
if m:
print(m.group(1))
Output
38
10.5
Upvotes: 2
Reputation: 364
re.findall(r"(?<![a-zA-Z:])[-+]?\d*\.?\d+", x)
This will find all numbers in the given string.
If you want the first number only you can access it simply through indexing:
re.findall(r"(?<![a-zA-Z:])[-+]?\d*\.?\d+", x)[0]
Upvotes: 0