onassar
onassar

Reputation: 3558

Regular expression that matches n numeric characters, along with 1 or 2 non-numeric characters

I've been using a standard ([0-9]+) pattern to match numbers in a string, but have a strange edge case now. I want to match the following:

123
456
.123
123.
%123
31st
14th
2nd
100.55
555.10

In the above cases, the non-numeric characters are:
.
%
s
h
n
d

But they could be a variety of characters.
Thoughts?

JS or PHP would be great.

Upvotes: 1

Views: 158

Answers (4)

sp00m
sp00m

Reputation: 48827

This one suits your needs:

^([.]|%)?\d+(((?<=^1)|(?<!^1)1)st|(?<!^1)((?<=^2)|2)nd|(?<!^1)((?<=^3)|3)rd|th|[.]\d*)?$

Demo

^                            # start of the string
([.]|%)?                     # . or % {0 or 1 time}
\d+                          # any digit {1 or more times}
(
    ((?<=^1)|(?<!^1)1)st     # either (1) or (ending with 1 but not 11) followed by st
    |(?<!^1)((?<=^2)|2)nd    # either (2) or (ending with 2 but not 12) followed by nd
    |(?<!^1)((?<=^3)|3)rd    # either (3) or (ending with 3 but not 13) followed by rd
    |th                      # th
    |[.]\d*                  # . followed by (a digit {0 or more times})
)?                           # {0 or 1 time}
$                            # end of the string

Note that JS won't be able to understand this regexp since it doesn't support lookbehinds (?<= and ?<!).

Upvotes: 1

Matthew
Matthew

Reputation: 9949

If you are looking for validation of patterns you expect (like dates?) you can do this:

[\d.%]*(nd|st|th){0,1}

If you know the position of the % is leading, or that you only have a single decimal or that if a decimal you don't want st/nd/etc you can refine like this:

([%]){0,1}[\d]*((((\.[\d]+){0,1}){0,1})|((nd|st|th|rd){0,1}))

I am still not handling a space anywhere but I think you can see how you might add that in? Further you may want to ensure that 1st versus 11th, etc - if you are worried about improving the validation further can start to go to something like for the date (you maybe able to google better):

([023]){0,1}1st|([02]){0,1}2nd|([02]){0,1}3rd|(11|12|13|30|(([012]){0,1}(([4-9])|0))th)

Some extra brackets there to try and make as clear as possible

Upvotes: 1

zzzzBov
zzzzBov

Reputation: 179096

Start with a pattern that gets you what you want:

\d+

Now you also want to match decimal numbers, so expand your options:

this one matches numbers followed by an optional decimal point

\d+\.?

this one matches decimal numbers:

\d*\.\d+

joining both will give you a solid number matching pattern (this may still have issues if you don't want to match numbers like 000.0000):

(?:\d+\.?|\d*\.\d+)

Now comes the tricky part. You need to determine exactly what other characters may be prefix or suffix the number.

Given the example, I will make the following assumptions:

  • % may prefix a decimal, but without a suffix
  • st, nd, rd, and th may suffix only whole numbers

Given these assumptions:

% characters can be optionally matched on decimals:

(?:%?(?:\d+\.?|\d*\.\d+))

whole numbers with suffixes can be matched with (this does not validate the suffixes, 1nd would be valid):

(?:\d+(?:st|nd|rd|th)

Joining these two patterns produces:

(?:(?:%?(?:\d+\.?|\d*\.\d+))|(?:\d+(?:st|nd|rd|th)))

Of course, you'll probably want to restrict the match to the entire string:

/^(?:(?:%?(?:\d+\.?|\d*\.\d+))|(?:\d+(?:st|nd|rd|th)))$/

Upvotes: 3

zessx
zessx

Reputation: 68790

I tried to create several rules for all cases :

(\d+(?:\.\d*)?)        // 123 ; 123. ; 123.45
([%.]\d+)              // %123 ; .123
(\d+(?:st|nd|th))      // 31st ; 2nd ; 14th

Then mixed :

((?:\d+(?:\.\d*)?)|(?:[%.]\d+)|(?:\d+(?:st|nd|th)))

If you want something shorter, you can simply use ([%.\dshnd]+), but this will catch many non-wanted entries, like %%123%%.

Upvotes: 2

Related Questions