Reputation: 125
I'm trying to exract phone numbers from a set of data. It has to be able to extract international and local numbers from each country.
The rules I've laid out for it are: 1. Look for the international symbol, indicating it's an international dialing number with a valid extension(from +1 to +999). 2. If the plus symbol is present, make sure the next following character is a number. 3. If there is none, look at the length to validate it is between 7 and 10 digits long. 4. In the event that the number is divided (correctly via international standers) by either a hyphen(-) or space make sure the amount of digits in between them are either 3 or 4
What I've got so var is:
\+(?=[1-999])(\d{4}[0-9][-\s]\d{3}[0-9][-\s]\d{4}[0-9])|(\d{7,11}[0-9])
That's for international, and the local search is\d{7,10}
The thing is, that it doesn't actually pick up numbers with spaces or hyphens in it. Can anybody give me some advice on it?
Upvotes: 1
Views: 393
Reputation: 46643
I'm not sure it will be possible to create a regex to match every country - some countries have conflicting rules.
it's entirely possible to have e.g. two valid local numbers contained within 1 valid international number.
You might want to start by looking at some of the answers to this question:
A comprehensive regex for phone number validation
If you're looking to create something definitive for every country, good luck, and you'll probably need to spend a while with some technical standards...
i.e. both 177
and 186-0039-011-81-90-1177-1177
are valid phone numbers in the same country
Upvotes: 0
Reputation: 336128
\d
already means "digit", so you shouldn't put another [0-9]
after it (which means the same).
In the same vein, [1-999]
doesn't mean what you think it does. It in fact matches one (1) digit between 1 and 9. You probably want \d{1,3}
although that would also match 0
.
Then, you're only allowing one variation of dividing blocks (4-3-4) - why? This is not going to match many, many valid phone numbers.
I would suggest the following:
Search your string using the regex \+?(?=\d)[\d\s-]{7,13}\b
to grab anything that remotely looks like a phone number. Perhaps you also want to include parentheses and slashes in the allowed character list: \+?(?=\d)[\d\s/()-]{7,14}\b
Then process and validate those strings separately, best after removing all punctuation/whitespace (except the +).
Upvotes: 1