Reputation: 252
I want to create a regular expression for python snippet.
import re
pattern = "\d*\.?\d+[Ee]?[+-]?\d*"
r = re.compile(pattern)
txt = """
12
.12
12.5
12.5E4
12.5e4
12.4E+4
12E4
12e-4
"""
x = r.findall(txt)
print(x)
for filtering all valid input from txt this code is fine but invalid input such as
.12e, 12.3+4
are also allowed how can I fix this?
Upvotes: 0
Views: 1415
Reputation: 43673
I suggest you to use regex pattern
^(?=\.?\d)\d*(?:\.\d*)?(?:[eE][+-]?\d+)?$
Upvotes: 1
Reputation: 6616
Don't use regular expressions when you don't need to. It's more Pythonic[tm] (and easier, and more reliable) to let Python determine which ones are valid.
results = []
for line in txt.split():
try:
float(line)
except ValueError:
pass
else:
results.append(line)
print results
Upvotes: 1
Reputation: 142146
Or, avoiding regexes all together, use the Python tokenizer to find them:
test2.txt
some bumph
2.34
1.7e2
some more bumph
sample code
from tokenize import generate_tokens, NUMBER
source = open('test2.txt').readline
numbers = [ (val, eval(val)) for typ, val, _, _, _ in generate_tokens(source) if typ==NUMBER]
print numbers
# [('2.34', 2.34), ('1.7e2', 170.0)]
Upvotes: 0
Reputation: 185
Here you are, probably the simplest yet:
^(\d*\.?\d+([Ee][+-]?\d+)?)$
Replace the ^ and $ with whatever you want the delims to be, whitespace or whatnot.
Solution explained:
Your solution
\d*\.?\d+[Ee]?[+-]?\d*
allowed for E's to be placed without digits -> hence the \d+ at the end of mine. I also made the E's and optional +/-, followed by that manditory digit in a single group (i.e., enclosed it all in parenthesis) so they can't exist without each other. That entire group ([Ee][+-]?\d+) is optional (?) to accomodate for your number examples without that notaion.
Upvotes: 0
Reputation: 2743
Something like this should do it (untested):
"\d*\.?\d+(?:[Ee][+-]\d)?\d*"
Upvotes: 0
Reputation: 213338
The traditional regexp is along these lines:
pattern = (
"(?:"
r"\d+(?:\.\d+)(:?[Ee][-+]?\d+)"
"|"
r"\.\d+(:?[Ee][+-]?\d+)"
")"
)
But you can always do things the easy way:
def is_number(x):
try:
float(x)
return True
except ValueError:
return False
Upvotes: 1
Reputation: 208475
Try changing your regex to the following:
\d*\.?\d+(?:[Ee][+-]?\d+)?
This makes it so that if the e
or E
is there, there is always at least one digit, and so that +
and -
are only valid if they follow the e
or E
.
Note that you should be using a raw string literal to make sure the backslashes are escaped properly (doesn't affect this string in particular, but if you tried to use something like \b
in your regex you would see the difference):
pattern = r"\d*\.?\d+(?:[Ee][+-]?\d+)?"
Upvotes: 0
Reputation: 3830
You can try: \d*\.?\d+(?:[Ee][+-]?\d+)?$
. This marks the exponent part as a group. I also added a $
to make sure it matches the end of the string.
Also, since your regex contains \
, you should use a raw string literal, example: r'\n'
, which is literal \n
, not the new line character.
The easier way would be to use float()
and check for ValueError
exception.
Upvotes: 0