Kelly S. French
Kelly S. French

Reputation: 12334

How to detect a floating point number using a regular expression

What is a good regular expression for handling a floating point number (i.e. like Java's Float)

The answer must match against the following targets:

 1) 1.  
 2) .2   
 3) 3.14  
 4) 5e6  
 5) 5e-6  
 6) 5E+6  
 7) 7.e8  
 8) 9.0E-10  
 9) .11e12  

In summary, it should

For those who are wondering, yes this is a homework problem. We received this as an assignment in my graduate CS class on compilers. I've already turned in my answer for the class and will post it as an answer to this question.

[Epilogue] My solution didn't get full credit because it didn't handle more than 1 digit to the left of the decimal. The assignment did mention handling Java floats even though none of the examples had more than 1 digit to the left of the decimal. I'll post the accepted answer in it's own post.

Upvotes: 19

Views: 48600

Answers (7)

Kelly S. French
Kelly S. French

Reputation: 12334

Here is what I turned in.

(([1-9]+\.[0-9]*)|([1-9]*\.[0-9]+)|([1-9]+))([eE][-+]?[0-9]+)?

To make it easier to discuss, I'll label the sections

( ([1-9]+ \. [0-9]* ) | ( [1-9]* \. [0-9]+ ) | ([1-9]+))  ( [eE] [-+]? [0-9]+ )?     
--------------------------------------------------------  ----------------------    
                           A                                       B

A: matches everything up to the 'e/E'
B: matches the scientific notation

Breaking down A we get three parts

 ( ([1-9]+ \. [0-9]* ) | ( [1-9]* \. [0-9]+ ) | ([1-9]+) )
   ----------1----------   ---------2----------   ---3----

Part 1: Allows 1 or more digits from 1-9, decimal, 0 or more digits after the decimal (target 1)
Part 2: Allows 0 or more digits from 1-9, decimal, 1 or more digits after the decimal (target 2)
Part 3: Allows 1 or more digits from 1-9 with no decimal (see #4 in target list)


Breaking down B we get 4 basic parts

 ( [eE] [-+]? [0-9]+  )?   
   ..--1- --2-- --3--- -4- .. 

Part 1: requires either upper or lowercase 'e' for scientific notation (e.g. targets 8 & 9)
Part 2: allows an optional positive or negative sign for the exponent (e.g. targets 4, 5, & 6)
Part 3: allows 1 or more digits for the exponent (target 8)
Part 4: allows the scientific notation to be optional as a group (target 3)

Upvotes: 3

Ram Chandra Giri
Ram Chandra Giri

Reputation: 147

@Kelly S. French, this regular expression matches all your test cases.

^[+-]?(\d+\.\d+|\d+\.|\.\d+|\d+)([eE][+-]?\d+)?$

Source: perldoc perlretut

Upvotes: 2

Heiko Schäfer
Heiko Schäfer

Reputation: 341

@Kelly S. French: the sign is missing because in a parser it would get added by the unary minus (negation) expression, therefore it is not neccessary to be detected as part of a float.

Upvotes: 1

Ivan
Ivan

Reputation: 11

'([-+])?\d*(\.)?\d+(([eE]([-+])?)?\d+)?'

That's the regular expression I have arrived at when trying to solve this kind of task in Matlab. Actually, it won't correctly detect numbers like (1.) but some additional changes may solve the problem... well, maybe the following would fix that:

'([-+])?(\d+(\.)?\d*|\d*(\.)?\d+)(([eE]([-+])?)?\d+)?'

Upvotes: 1

Kelly S. French
Kelly S. French

Reputation: 12334

[This is the answer from the professor]

Define:

N = [1-9]
D = 0 | N
E = [eE] [+-]? D+
L = 0 | ( N D* )

Then floating point numbers can be matched with:

( ( L . D* | . D+ ) E? ) | ( L E )

It was also acceptable to use D+ rather than L, and to prepend [+-]?.

A common mistake was to write D* . D*, but this can match just '.'.

[Edit]
Someone asked about a leading sign; I should have asked him why it was excluded but never got the chance. Since this was part of the lecture on grammars, my guess is that either it made the problem easier (not likely) or there is a small detail in parsing where you divide the problem set such that the floating point value, regardless of sign, is the focus (possible).

If you are parsing through an expression, e.g.

-5.04e-10 + 3.14159E10

the sign of the floating point value is part of the operation to be applied to the value and not an attribute of the number itself. In other words,

subtract (5.04e-10)
add (3.14159E10)

to form the result of the expression. While I'm sure mathematicians may argue the point, remember this was from a lecture on parsing.

Upvotes: 8

Alex Martelli
Alex Martelli

Reputation: 881715

Just make both the decimal dot and the E-then-exponent part optional:

[1-9][0-9]*\.?[0-9]*([Ee][+-]?[0-9]+)?

I don't see why you don't want a leading [+-]? to capture a possible sign too, but, whatever!-)

Edit: there might in fact be no digits left of the decimal point (in which case I imagine there must be the decimal point and 1+ digits after it!), so a vertical-bar (alternative) is clearly needed:

(([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?

Upvotes: 26

Related Questions