Tech Avenue
Tech Avenue

Reputation: 81

Regex improvements for international, common and RF3966 phone number validation?

Context

Hi, earlier I was browsing the web in order to find a quick answer about telephone number validation in one regex formula : for emergency, short, international, french, spanish and north american numbers (normal, fancy and extended versions).

Strangely, I couldn't find better than "A comprehensive regex for phone number formula", since it seems to be the best topic about this, or I missed it, which is totally possible.

So I'm new to the site and actually writing this very first question (yeah!), since that other thread is currently on hold of some sort : seems the author didn't get what he and I were seeking.

That makes at least three of us who would like to have a good solution, as I know at least my pal, the one who asked me first about finding one to be used in simple integrations like his Google Forms.

Hence my current question(s) and own answer to begin with, since I took some night time to build my own based on advices and tests patterns from the best replies on the other thread. If you're interested by the topic, there are some interesting elements.

Questions

What is the best way to optimize and improve this regex (without resorting to coding) which is dedicated to validation of international and most national phone numbers (along the recommendations of RFC 3966 at least)?

Not sure if I can add a related question as well (since it is still on purpose to improve the usefulness of the regex pattern), no harm asking I guess.

Are there other commonly-used formats that this regex should match (and not)?

If you can add them (or a link) here for me to update my test bundles, I would be thankful. Equally useful would be phone numbers that should definitely not be validated (the unwanted).

My initial solution

Another potentially side dish is to isolate matching groups for country code, area code and extended code... and things work relatively dandy to a certain point : it only works well when there are some separators (or the parenthesis) to distinguish those groups of digits.

Matching goals

Another matching goal is to have a regex that do not under-perform too much, not really picky since it is not to be used in critical parts of code.

Still, how could we optimize those best regex(es) people will find/propose without changing their results?

Goals from the main thread

Not sure how it should work since I though that + (or its equivalent the double zero 00) was required in front of any international number... always done it that way. The other thread had a list of positive matches without.

Could someone confirm that + or 00 is not mandatory to US numbers? Thank you again.

Best of unwanted formats

Regex101.com was a big plus to rewrite and test the regex to this point, I couldn't have progressed so far without its help. Yet, I'm no expert so I can only scratch the surface here and I need your help to improve this.

Thank you for reading, it was very educating to write the question (but not something I would do every day, very time-consuming at my pace), hope it will find its answers as well. Have a nice day (or night... ;) ).

Upvotes: 5

Views: 1882

Answers (1)

Tech Avenue
Tech Avenue

Reputation: 81

Before I forgot, here's the post of the latest version of the regex I put together and its code :

^(?=(?:\+|0{2})?(?:(?:[\(\-\)\.\/ \t\f]*\d){7,10})?(?:[\-\.\/ \t\f]?\d{2,3})(?:[\-\s]?[ext]{1,3}[\-\.\/ \t\f]?\d{1,4})?$)((?:\+|0{2})\d{0,3})?(?:[\-\.\/ \t\f]?)(\(0\d[ ]?\d{0,4}\)|\(\d{0,4}\)|\d{0,4})(?:[\-\.\/ \t\f]{0,2}\d){3,8}(?:[\-\s]?(?:x|ext)[\-\t\f ]?(\d{1,4}))?$

As far as I know, it pass the tests I put in the question and some more that I added on that Regex101.com page. You can even fork it, very useful feature indeed, I'm a new fan. :)

The code seems to work, as is, with PHP (pcre), Python and Javascript (but not Golang) with different performance that are not awesome but good enough for our purpose.

For instance, I wanted to use \h for horizontal whitespaces (instead of \t, \f and space, but it is less compatible with the different platforms.

It still need a lot of improvements, and I'm eager to see what you will be cooking to answer this little problem of ours, but I'm spent... already a sunny morning here. Good night folks.

Upvotes: 3

Related Questions