Reputation: 385
I have a regex for universal phone numbers:
**/^(\+\d)*\s*(\(\d{3}\)\s*)*\d{3}(-{0,1}|\s{0,1})\d{2}(-{0,1}|\s{0,1})\d{2}$/**
It is accepting the following strings:
339-4248
(095) 2569835
+7 (095) 1452389
+1(963)9632587
+12365874
2365789
But it's not accepting
+12589637412
+1 963 9632587
+1701234567
What's the matter with this? Please help me figure out where I am wrong.
Upvotes: 1
Views: 4294
Reputation: 575
One thing you can do is to research all the formats. You have found a few good ones. There are more here: http://en.wikipedia.org/wiki/Local_conventions_for_writing_telephone_numbers
Next you want to find documents in your corpus that have phone numbers in them, and others that have numbers that aren't phone numbers. This isn't needed if you are dealing with structured data as much. The idea is you want a control group to show you aren't overreaching.
Then you want to get something like visual-regexp (a common OS independent software package) and put your text into it and start creating regex's until you cover all of your cases.
Doing that with just your examples I came up with this: regexp -nocase -all -line -- {+?(?[0-9])?\ ?[0-9-]} string match
--Pete
Upvotes: 0
Reputation: 46187
Why do you care where users care to break up the groups of digits or what characters they use to do so? Around here (Sweden), it's common to see one person write a given phone number as 046 123 456 789
and someone else write it 046 123 45 67 89
, but both are dialed identically and are equally valid. (As, for that matter, would be 04 61 2345 6 78 9
- not a format I've ever seen used, but it still dials identically.)
Just strip out non-numeric characters (other than a leading +
, since that's meaningful), check that it's a reasonable number of digits, store that, and render it into your preferred format when displaying the number. Or keep the format as entered by the user, although then you need to take the normal precautions to prevent SQL injection, CSS, XSRF, etc. attacks.
Upvotes: 1
Reputation: 189387
It only accepts certain multiples of digits, and it only accepts spaces in some places within a number. My recommendation would be to ditch it and revert to a really simple, relaxed check, or else a documented, supported, internationally tested solution (libphone or some such).
Upvotes: 0