Smile.Hunter
Smile.Hunter

Reputation: 252

Python Regular Expression for validate numbers

I want to create a regular expression for python snippet.

import re
pattern = "\d*\.?\d+[Ee]?[+-]?\d*"
r = re.compile(pattern)
txt = """
12
.12
12.5
12.5E4
12.5e4
12.4E+4
12E4
12e-4
"""
x = r.findall(txt)
print(x)

for filtering all valid input from txt this code is fine but invalid input such as

.12e, 12.3+4

are also allowed how can I fix this?

Upvotes: 0

Views: 1415

Answers (8)

Ωmega
Ωmega

Reputation: 43673

I suggest you to use regex pattern

^(?=\.?\d)\d*(?:\.\d*)?(?:[eE][+-]?\d+)?$

Upvotes: 1

Thijs van Dien
Thijs van Dien

Reputation: 6616

Don't use regular expressions when you don't need to. It's more Pythonic[tm] (and easier, and more reliable) to let Python determine which ones are valid.

results = []
for line in txt.split():
    try:
        float(line)
    except ValueError:
        pass
    else:
        results.append(line)
print results

Upvotes: 1

Jon Clements
Jon Clements

Reputation: 142146

Or, avoiding regexes all together, use the Python tokenizer to find them:

test2.txt

some bumph
2.34
1.7e2
some more bumph

sample code

from tokenize import generate_tokens, NUMBER

source = open('test2.txt').readline
numbers = [ (val, eval(val)) for typ, val, _, _, _ in generate_tokens(source) if typ==NUMBER]
print numbers
# [('2.34', 2.34), ('1.7e2', 170.0)]

Upvotes: 0

Jiman
Jiman

Reputation: 185

Here you are, probably the simplest yet:

^(\d*\.?\d+([Ee][+-]?\d+)?)$

Replace the ^ and $ with whatever you want the delims to be, whitespace or whatnot.

Solution explained:

Your solution

\d*\.?\d+[Ee]?[+-]?\d*

allowed for E's to be placed without digits -> hence the \d+ at the end of mine. I also made the E's and optional +/-, followed by that manditory digit in a single group (i.e., enclosed it all in parenthesis) so they can't exist without each other. That entire group ([Ee][+-]?\d+) is optional (?) to accomodate for your number examples without that notaion.

Upvotes: 0

addiedx44
addiedx44

Reputation: 2743

Something like this should do it (untested):

"\d*\.?\d+(?:[Ee][+-]\d)?\d*"

Upvotes: 0

Dietrich Epp
Dietrich Epp

Reputation: 213338

The traditional regexp is along these lines:

pattern = (
    "(?:"
    r"\d+(?:\.\d+)(:?[Ee][-+]?\d+)"
    "|"
    r"\.\d+(:?[Ee][+-]?\d+)"
    ")"
)

But you can always do things the easy way:

def is_number(x):
    try:
        float(x)
        return True
    except ValueError:
        return False

Upvotes: 1

Andrew Clark
Andrew Clark

Reputation: 208475

Try changing your regex to the following:

\d*\.?\d+(?:[Ee][+-]?\d+)?

This makes it so that if the e or E is there, there is always at least one digit, and so that + and - are only valid if they follow the e or E.

Note that you should be using a raw string literal to make sure the backslashes are escaped properly (doesn't affect this string in particular, but if you tried to use something like \b in your regex you would see the difference):

pattern = r"\d*\.?\d+(?:[Ee][+-]?\d+)?"

Upvotes: 0

quantum
quantum

Reputation: 3830

You can try: \d*\.?\d+(?:[Ee][+-]?\d+)?$. This marks the exponent part as a group. I also added a $ to make sure it matches the end of the string.

Also, since your regex contains \, you should use a raw string literal, example: r'\n', which is literal \n, not the new line character.

The easier way would be to use float() and check for ValueError exception.

Upvotes: 0

Related Questions