jojo_Berlin
jojo_Berlin

Reputation: 693

Regex for matching different float formats

I'm looking for a regex in scala to match several floats:

    9,487,346 -> should match
    9.487.356,453->should match
    38,4 -> match
    -38,4 -> should match
    -38.5
    -9,487,346.76
    -38 -> should match
     

So basically it should match a number that:

  1. possibly gave thousand separators (either comma or dot)
  2. possibly are decimal again with either comma or dot as separator

Currently I'm stuck with

    val pattern="\\d+((\\.\\d{3}+)?(,\\d{1,2}+)?|(,\\d{3}+)?(\\.\\d{1,2}+)?)" 

Edit: I'm mostly concered with European Notation.

Example where the current pattern not matches: 1,052,161

I guess it would be close enough to match that the String only contains numbers,sign, comma and dot

Upvotes: 0

Views: 193

Answers (2)

degant
degant

Reputation: 4981

Based on your rules,

It should match a number that:

  • possibly gave thousand separators (either comma or dot)
  • possibly are decimal again with either comma or dot as separator

Regex:

^[+-]?\d{1,3}(?:[,.]\d{3})*(?:[,.]\d+)?$
  • [+-]? Allows + or - or nothing at the start
  • \d{1,3} allows one to 3 digits
  • ([,.]\d{3}) allows . or , as thousands separator followed by 3 digits (* allows unlimited such matches)
  • (?:[,.]\d+)? allows . or , as decimal separator followed by at least one digit.

This matches all of the OP's example cases. Take a look at the demo below for more:

Regex101 Demo

However one limitation is it allows . or , as thousand separator and as decimal separator and doesn't validate that if , is thousands separator then . should be decimal separator. As a result the below cases incorrectly show up as matches:

201,350,780,88
211.950.266.4

To fix this as well, the previous regex can have 2 alternatives - one to check for a notation that has , as thousands separator and . as decimal, and another one to check vice-versa. Regex:

^[+-]?\d{1,3}(?:(?:(?:\.\d{3})*(?:\,\d+)?)|(?:(?:\,\d{3})*(?:\.\d+)?))$

Regex101 Demo

Hope this helps!

Upvotes: 0

jwvh
jwvh

Reputation: 51271

If, as your edit suggests, you are willing to accept a string that simply "contains numbers, sign, comma and dot" then the task is trivial.

[+-]?\d[\d.,]*

update

After thinking it over, and considering some options, I realize that your original request is possible if you'll allow for 2 different RE patterns, one for US-style numbers (commas before dot) and one for Euro-style numbers (dots before comma).

def isValidNum(num: String): Boolean =
  num.matches("[+-]?\\d{1,3}(,\\d{3})*(\\.\\d+)?") ||
    num.matches("[+-]?\\d{1,3}(\\.\\d{3})*(,\\d+)?")

Note that the thousand separators are not optional, so a number like "1234" is not evaluated as valid. That can be changed by adding more RE patterns: || num.matches("[+-]?\\d+")

Upvotes: 1

Related Questions