Reputation: 19
This regular expression should match phone numbers with or without separators:
phonePattern = re.compile(r'^(\d{3})\D*(\d{3})\D+(\d{4})\D*(\d*)$')
It works well for a phone number like this: 800-555-1212-1234
, but still doesn't match if it is: 80055512121234
.
Even though I'm using the *
to indicate zero or more non-white-space characters.
Upvotes: 0
Views: 179
Reputation: 25461
You have the \D+
(one or more non-digits) in your regexp. Also you don't want to have zero or more delimiters. You want exactly single or no delimiters at all, so:
^(\d{3})\D?(\d{3})\D?(\d{4})\D?(\d*)$
Anyway I would use the -
instead of the non-digit (\D
) if you don't want to match something like 123a456b7890c
:
^(\d{3})-?(\d{3})-?(\d{4})-?(\d*)$
The regular expression in words:
^
: beginning of the string(\d{3})
: a group of 3 digits-?
: none or single dash(\d*)
: a group of zero or more digits$
: end of the stringAlso, I can recommend the Case study: Parsing Phone Numbers chapter from the Dive Into Python book for some further reading.
Update: it's a good point made by Josh Smeaton in his comment. Depending on your use case it may be easier to sanitize the string first (i.e. remove the dashes) and then validation is just about checking if all characters in the string are digits and if the length is right. If you're storing those phone numbers somewhere it's better to have them in a common format, not once with and once without dashes.
Upvotes: 4
Reputation: 121
Your second \D is followed by + -- this will match one or more non-digits. Replacing it with * will match your second string, so your regexp would look like:
'^(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$'
However, as erip and Dawid Ferenczy suggested, it's probably a good idea to use '?' instead, which will match up to one character:
'^(\d{3})\D?(\d{3})\D?(\d{4})\D?(\d*)$'
Upvotes: 0