Reputation: 47
I have an input string "10 3.14 5 2.718" and want to get the output ["10", "5"] using regex only.
I tried Gemini and Copilot but both couldn't produce the right output as their proposed solutions keep using \b in the regex pattern, no matter how many times I prompt their solutions don't work. Problem is decimal point also works as a word boundary (\b).
My code as below:
import re
text = "10 3.14 5 2.718"
pattern = r"\d+(?!\.\d+)"
matches = re.findall(pattern, text)
print(matches)
Upvotes: -2
Views: 79
Reputation: 198456
To only match the integral 10
and 5
from 10 3.14 5 2.718
, use this:
(?<!\.)\b\d+\b(?!\.)
Starting from a place not preceded by a decimal point, match a word boundary, as many digits as possible, and end in a place not followed by a decimal point.
Upvotes: 1
Reputation: 522506
If you want to catch integers only, you should use the pattern:
(?<!\S)\d+(?!\S)
Updated code:
import re
text = "10 3.14 5 2.718"
pattern = r"(?<!\S)\d+(?!\S)"
matches = re.findall(pattern, text)
print(matches)
The pattern used above says to find all integers which are surrounded on both sides by either whitespace or the start/end of the string. Note carefully that regular word boundaries won't work here:
\b\d+\b
The reason is that the boundary between a digit and dot constitutes a word boundary. So doing an re.findall
on 2.718
using \b\d+\b
would match both 2
and 718
.
Upvotes: 5