adrianm
adrianm

Reputation: 14736

trim phone number with regex

Probably an easy regex question.

How do I remove all non-digts except leading + from a phone number?

i.e.

012-3456 => 0123456
+1 (234) 56789 => +123456789

Upvotes: 5

Views: 10806

Answers (8)

g1smd
g1smd

Reputation: 43

How do I remove all non-digits except leading + from a phone number?

Removing ( and ) and spaces from +44 (0) 20 3000 9000 results in the non-valid number +4402030009000. It should be +442030009000.

The tidying routine needs several steps to deal with country code (with or without access code or +) and/or trunk code and/or punctuation either singly or in any combination.

Upvotes: 2

Trey Hunner
Trey Hunner

Reputation: 11814

If global regular expressions are supported you could simply replace all characters that are not a digit or plus symbol:

s/[^0-9+]//g

If global regular expressions are not supported you could match as many possible number groups as might be valid in your given phone number format:

s/([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)/\1\2\3\4/

Upvotes: 1

dawg
dawg

Reputation: 104032

It is certainly possible to do that all in one regex, but I prefer simpler regexs that will deal with the leading plus correctly and the leading and trailing whitespace:

#!/usr/bin/perl 
while (<DATA>) {
    print "DATA Read: \$_=$_";  #\n already there...
    s/\s*(.*)\s*/$1/g;
    $s=s/(^\+){0,1}//?$1:'';
    s/[^\d]//g;
    print "Formatted: $s$_\n====\n";
 }


 __DATA__
 012-3456
 +1 (234) 56789
          +1 (234) 56789
 1234-56789        |
 +12345+6789

Output:

DATA Read: $_=012-3456
Formatted: 0123456
====
DATA Read: $_=+1 (234) 56789
Formatted: +123456789
====
DATA Read: $_=         +1 (234) 56789
Formatted: +123456789
====
DATA Read: $_=1234-56789        |
Formatted: 123456789
====
DATA Read: $_=+12345+6789
Formatted: +123456789

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336418

/(?<!^)\+|[^\d+]+//g

will remove all non-numbers and leave a leading + alone. Note that leading whitespace will cause the "leave + alone" bit to fail. In .NET languages, this can be worked into the regex, in others you should strip whitespace first before passing the string to this regex.

Explanation:

(?<!^)\+: Match a + unless it's at the start of the string. (In .NET, use (?<!^\s*)\+ to allow for leading whitespace).

| or

[^\d+]+: match any run of characters that are neither numbers nor +.

Before (using (?<!^\s*)\+|[^\d+]+):

+49 (123) 234 5678
  +1 (555) 234-5678
+7 (23) 45/6789+10
(0123) 345/5678, ext. 666

After:

+491232345678
+15552345678
+72345678910
01233455678666

Upvotes: 15

polygenelubricants
polygenelubricants

Reputation: 383886

In Java, you can do

public static String trimmed(String phoneNumber) {
   return phoneNumber.replaceAll("[^+\\d]", "");
}

This will keep all +, even if it's in the middle of phoneNumber. If you want to remove any + in the middle, then do something like this:

return phoneNumber.replaceAll("[^+\\d]|(?<=.)\\+", "");

(?<=.) is a lookbehind to see if there was a preceding character before the +.

System.out.println("[" + trimmed("+1 (234)++56789 ") + "]");
// prints "[+123456789]"

Upvotes: 2

YOU
YOU

Reputation: 123881

Just replace everything except digits and + to ''

/[^\d+]/

In Python,

>>> import re
>>> re.sub("[^\d+]","","+1 (234) 56789")
'+123456789'
>>>

Upvotes: 0

adhanlon
adhanlon

Reputation: 6539

use perl,

my $number = // set it equal to phone number
$number =~ s/[^\d+]//g

This will still allow for a plus sign to be anywhere, if you want it to only allow a plus sign in the beginning, I will leave that part up to you. You can't just have the entire answer given to you or else you won't learn.

Essentially what that does now, is it will replace anything in $number that is not a digit or a plus sign with an empty string

Upvotes: 0

Alexander
Alexander

Reputation: 3754

You cannot simply remove the '+' symbol. It has to be treated like '00' and belongs to the country code. '+xx' is the same as '00xx'.

Anyway, handling phone numbers with regex is like parsing html with regex...nearly impossible because there are so many (correct) spelling formats.

My advice would be be to write a custom class for handling phone numbers and not to use regex.

Upvotes: -3

Related Questions