Reputation: 3558
I've been using a standard ([0-9]+)
pattern to match numbers in a string, but have a strange edge case now. I want to match the following:
123
456
.123
123.
%123
31st
14th
2nd
100.55
555.10
In the above cases, the non-numeric characters are:
.
%
s
h
n
d
But they could be a variety of characters.
Thoughts?
JS or PHP would be great.
Upvotes: 1
Views: 158
Reputation: 48827
This one suits your needs:
^([.]|%)?\d+(((?<=^1)|(?<!^1)1)st|(?<!^1)((?<=^2)|2)nd|(?<!^1)((?<=^3)|3)rd|th|[.]\d*)?$
^ # start of the string
([.]|%)? # . or % {0 or 1 time}
\d+ # any digit {1 or more times}
(
((?<=^1)|(?<!^1)1)st # either (1) or (ending with 1 but not 11) followed by st
|(?<!^1)((?<=^2)|2)nd # either (2) or (ending with 2 but not 12) followed by nd
|(?<!^1)((?<=^3)|3)rd # either (3) or (ending with 3 but not 13) followed by rd
|th # th
|[.]\d* # . followed by (a digit {0 or more times})
)? # {0 or 1 time}
$ # end of the string
Note that JS won't be able to understand this regexp since it doesn't support lookbehinds (?<=
and ?<!
).
Upvotes: 1
Reputation: 9949
If you are looking for validation of patterns you expect (like dates?) you can do this:
[\d.%]*(nd|st|th){0,1}
If you know the position of the % is leading, or that you only have a single decimal or that if a decimal you don't want st/nd/etc you can refine like this:
([%]){0,1}[\d]*((((\.[\d]+){0,1}){0,1})|((nd|st|th|rd){0,1}))
I am still not handling a space anywhere but I think you can see how you might add that in? Further you may want to ensure that 1st versus 11th, etc - if you are worried about improving the validation further can start to go to something like for the date (you maybe able to google better):
([023]){0,1}1st|([02]){0,1}2nd|([02]){0,1}3rd|(11|12|13|30|(([012]){0,1}(([4-9])|0))th)
Some extra brackets there to try and make as clear as possible
Upvotes: 1
Reputation: 179096
Start with a pattern that gets you what you want:
\d+
Now you also want to match decimal numbers, so expand your options:
this one matches numbers followed by an optional decimal point
\d+\.?
this one matches decimal numbers:
\d*\.\d+
joining both will give you a solid number matching pattern (this may still have issues if you don't want to match numbers like 000.0000
):
(?:\d+\.?|\d*\.\d+)
Now comes the tricky part. You need to determine exactly what other characters may be prefix or suffix the number.
Given the example, I will make the following assumptions:
%
may prefix a decimal, but without a suffixst
, nd
, rd
, and th
may suffix only whole numbersGiven these assumptions:
%
characters can be optionally matched on decimals:
(?:%?(?:\d+\.?|\d*\.\d+))
whole numbers with suffixes can be matched with (this does not validate the suffixes, 1nd
would be valid):
(?:\d+(?:st|nd|rd|th)
Joining these two patterns produces:
(?:(?:%?(?:\d+\.?|\d*\.\d+))|(?:\d+(?:st|nd|rd|th)))
Of course, you'll probably want to restrict the match to the entire string:
/^(?:(?:%?(?:\d+\.?|\d*\.\d+))|(?:\d+(?:st|nd|rd|th)))$/
Upvotes: 3
Reputation: 68790
I tried to create several rules for all cases :
(\d+(?:\.\d*)?) // 123 ; 123. ; 123.45
([%.]\d+) // %123 ; .123
(\d+(?:st|nd|th)) // 31st ; 2nd ; 14th
Then mixed :
((?:\d+(?:\.\d*)?)|(?:[%.]\d+)|(?:\d+(?:st|nd|th)))
If you want something shorter, you can simply use ([%.\dshnd]+)
, but this will catch many non-wanted entries, like %%123%%
.
Upvotes: 2