Jason
Jason

Reputation: 47

How to match numbers that don't contain decimal point using python regex only

I have an input string "10 3.14 5 2.718" and want to get the output ["10", "5"] using regex only.

I tried Gemini and Copilot but both couldn't produce the right output as their proposed solutions keep using \b in the regex pattern, no matter how many times I prompt their solutions don't work. Problem is decimal point also works as a word boundary (\b).

My code as below:

import re
text = "10 3.14 5 2.718"
pattern = r"\d+(?!\.\d+)"
matches = re.findall(pattern, text)
print(matches)

Upvotes: -2

Views: 79

Answers (2)

Amadan
Amadan

Reputation: 198456

To only match the integral 10 and 5 from 10 3.14 5 2.718, use this:

(?<!\.)\b\d+\b(?!\.)

Starting from a place not preceded by a decimal point, match a word boundary, as many digits as possible, and end in a place not followed by a decimal point.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522506

If you want to catch integers only, you should use the pattern:

(?<!\S)\d+(?!\S)

Updated code:

import re
text = "10 3.14 5 2.718"
pattern = r"(?<!\S)\d+(?!\S)"
matches = re.findall(pattern, text)
print(matches)

The pattern used above says to find all integers which are surrounded on both sides by either whitespace or the start/end of the string. Note carefully that regular word boundaries won't work here:

\b\d+\b

The reason is that the boundary between a digit and dot constitutes a word boundary. So doing an re.findall on 2.718 using \b\d+\b would match both 2 and 718.

Upvotes: 5

Related Questions