user958414
user958414

Reputation: 385

Regex for universal phone number

I have a regex for universal phone numbers:

**/^(\+\d)*\s*(\(\d{3}\)\s*)*\d{3}(-{0,1}|\s{0,1})\d{2}(-{0,1}|\s{0,1})\d{2}$/**

It is accepting the following strings:

339-4248 
(095) 2569835 
+7 (095) 1452389
+1(963)9632587
+12365874
2365789

But it's not accepting

+12589637412
+1 963 9632587
+1701234567

What's the matter with this? Please help me figure out where I am wrong.

Upvotes: 1

Views: 4294

Answers (3)

Pete Mancini
Pete Mancini

Reputation: 575

One thing you can do is to research all the formats. You have found a few good ones. There are more here: http://en.wikipedia.org/wiki/Local_conventions_for_writing_telephone_numbers

Next you want to find documents in your corpus that have phone numbers in them, and others that have numbers that aren't phone numbers. This isn't needed if you are dealing with structured data as much. The idea is you want a control group to show you aren't overreaching.

Then you want to get something like visual-regexp (a common OS independent software package) and put your text into it and start creating regex's until you cover all of your cases.

Doing that with just your examples I came up with this: regexp -nocase -all -line -- {+?(?[0-9])?\ ?[0-9-]} string match

--Pete

Upvotes: 0

Dave Sherohman
Dave Sherohman

Reputation: 46187

Why do you care where users care to break up the groups of digits or what characters they use to do so? Around here (Sweden), it's common to see one person write a given phone number as 046 123 456 789 and someone else write it 046 123 45 67 89, but both are dialed identically and are equally valid. (As, for that matter, would be 04 61 2345 6 78 9 - not a format I've ever seen used, but it still dials identically.)

Just strip out non-numeric characters (other than a leading +, since that's meaningful), check that it's a reasonable number of digits, store that, and render it into your preferred format when displaying the number. Or keep the format as entered by the user, although then you need to take the normal precautions to prevent SQL injection, CSS, XSRF, etc. attacks.

Upvotes: 1

tripleee
tripleee

Reputation: 189387

It only accepts certain multiples of digits, and it only accepts spaces in some places within a number. My recommendation would be to ditch it and revert to a really simple, relaxed check, or else a documented, supported, internationally tested solution (libphone or some such).

Upvotes: 0

Related Questions