Eduard
Eduard

Reputation: 141

Regular expression for the language of Decimals

I was told this was the answer to the task "Write down a RE for the language of Decimals" and it goes something like:

Decimals = (0+D(Z)*)+((0+D(Z)*).(0+D(Z)*D))

D = 1+2+3+4+5+6+7+8+9
       
Z = D+0

Note i am using '+' for Union

I though we could have just this > (0+D(Z)*).(0+D(Z)*D)

Why do we need to have (0+D(Z)*) before the union as well? I was thinking this would make sense if it was for any positive number, integer or decimal. Can someone let me know what is going on.

Thanks in advance.

Upvotes: 0

Views: 382

Answers (2)

user13843220
user13843220

Reputation:

The basic numeric template regex that mirrors how a numeric function string -> number
would parse the source is as follows:

(?:\d+(?:\.\d*)?|\.\d+)

This handles any input in any language that would convert a string to a decimal number.
This is essentially a validation.

ie.

 (?:
    \d+                # digits required
    (?: \. \d* )?      # optional dot digits (optional)
  |                   # or
    \. \d+             # dot and digits required
 )

Modded info:

This is an analysis of your regex "Language of Decimals"
showing its fallacies.

Definitions from your language :

Decimals your language formula = (0+D(Z)*)+((0+D(Z)*).(0+D(Z)*D))

D = 1+2+3+4+5+6+7+8+9

Z = D+0

+ = Union

==============================

Decimal Lang Substitutions :

Decimals(D) =
(0+D(D+0)*)+((0+D(D+0)*).(0+D(D+0)*D))
Decimals(num) =
(0+1+2+3+4+5+6+7+8+9(1+2+3+4+5+6+7+8+9+0)*)+((0+1+2+3+4+5+6+7+8+9(1+2+3+4+5+6+7+8+9+0)*).(0+1+2+3+4+5+6+7+8+9(1+2+3+4+5+6+7+8+9+0)*1+2+3+4+5+6+7+8+9))

==============================

Decimals to Regex Definitions :

+ = | alternation (union)

* = * quantifier (0 to many)

. = \. literal dot

() = () grouping

Regex Substitutions :

Decimals(Regex) =

(0|1|2|3|4|5|6|7|8|9(1|2|3|4|5|6|7|8|9|0)*)|((0|1|2|3|4|5|6|7|8|9(1|2|3|4|5|6|7|8|9|0)*)\.(0|1|2|3|4|5|6|7|8|9(1|2|3|4|5|6|7|8|9|0)*1|2|3|4|5|6|7|8|9))

Decimals(Regex - Class factored)1 =
([0-9]([0-9])*)|(([0-9]([0-9])*)\.([0-9]([0-9])*[1-9]))

Decimals(Regex - Class factored)2 =
([0-9]+)|(([0-9]+)\.([0-9]+[1-9]))

Decimals(Regex - Class factored)3 =
[0-9]+(\.[0-9]+[1-9])?

The fallacy of your "Language of Decimals" :

 [0-9]+ 
 (                             # (1 start)
    \. 
    [0-9]+ 
    [1-9] 
 )?                            # (1 end)

OK:

  • The dot group is optional

Not ok:

  • The dot group requires at least 2 digits
  • The dot group requires last digit not be a zero
  • If the dot group present, requires digits before it

Conclusion:

This regex won't match numbers like
87. , .215 , .6 , .077 , 44.2 , 8.30

Given that the algo - string to integer or float
requires that the above are legal, your regex is a special case used for DISPLAY purposes only,
and should not be used for number parsing validation (as in the case of a general numeric parser).

To label your regex "Language of Decimals" is erroneous in the general sense.

Upvotes: 2

Barmar
Barmar

Reputation: 780842

In the Decimal language they're defining, the fraction is optional. Your regular expression requires the decimal point.

The first alternative (0+D(Z)*) matches a number without a decimal point. The second alternative ((0+D(Z)*).(0+D(Z)*D)) matches a number with a decimal point.

Upvotes: 2

Related Questions