Orion Edwards
Orion Edwards

Reputation: 123642

Searching for phone numbers in mysql

I have a table which is full of arbitrarily formatted phone numbers, like this

027 123 5644
021 393-5593
(07) 123 456
042123456

I need to search for a phone number in a similarly arbitrary format ( e.g. 07123456 should find the entry (07) 123 456

The way I'd do this in a normal programming language is to strip all the non-digit characters out of the 'needle', then go through each number in the haystack, strip all non-digit characters out of it, then compare against the needle, eg (in ruby)

digits_only = lambda{ |n| n.gsub /[^\d]/, '' }

needle = digits_only[input_phone_number]
haystack.map(&digits_only).include?(needle)

The catch is, I need to do this in MySQL. It has a host of string functions, none of which really seem to do what I want.

Currently I can think of 2 'solutions'

However, neither of these seem like particularly elegant solutions.
Hopefully someone can help or I might be forced to use the %%%%%% solution

Update: This is operating over a relatively fixed set of data, with maybe a few hundred rows. I just didn't want to do something ridiculously bad that future programmers would cry over.

If the dataset grows I'll take the 'phoneStripped' approach. Thanks for all the feedback!


could you use a "replace" function to strip out any instances of "(", "-" and " ",

I'm not concerned about the result being numeric. The main characters I need to consider are +, -, (, ) and space So would that solution look like this?

SELECT * FROM people 
WHERE 
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phonenumber, '('),')'),'-'),' '),'+')
LIKE '123456'

Wouldn't that be terribly slow?

Upvotes: 19

Views: 25240

Answers (18)

Meloman
Meloman

Reputation: 3712

In my case, I needed to identify Swiss (CH) mobile phone numbers in the phone column and move them in mobile column.

As all mobile phone numbers starts with 07x or +417x here is the regex to use :

/^(\+[0-9][0-9]\s*|0|)7.*/mgix

It find all numbers like the following :

  • +41 79 123 456 78
  • +417612345678
  • 076 123 456 78
  • 07812345678
  • 7712345678

and ignore all others like theese :

  • +41 47 123 456 78
  • +413212345678
  • 021 123 456 78
  • 02212345678
  • 3412345678

In MySQL it gives the following code :

UPDATE `contact` 
SET `mobile` = `phone`,
    `phone`  = ''
WHERE `phone` REGEXP '^(\\+[\D+][0-9]\\s*|0|)(7.*)$'

You'll need to clean your number from special chars like -/.() before.

https://regex101.com/r/AiWFX8/1

Upvotes: 0

ideaztech
ideaztech

Reputation: 2596

Here is a working Solution for PHP users.

This uses a loop in PHP to build the Regular Expression. Then searches the database in MySQL with the RLIKE operator.

$phone = '(456) 584-5874'                      // can be any format
$phone = preg_replace('/[^0-9]/', '', $phone); // strip non-numeric characters
$len = strlen($phone);                         // get length of phone number
for ($i = 0; $i < $len - 1; $i++) {
    $regex .= $phone[$i] . "[^[:digit:]]*";
}
$regex .= $phone[$len - 1];

This creates a Regular Expression that looks like this: 4[^[:digit:]]*5[^[:digit:]]*6[^[:digit:]]*5[^[:digit:]]*8[^[:digit:]]*4[^[:digit:]]*5[^[:digit:]]*8[^[:digit:]]*7[^[:digit:]]*4

Now formulate your MySQL something like this:

$sql = "SELECT Client FROM tb_clients WHERE Phone RLIKE '$regex'"

NOTE: I tried several of the other posted answers but found performance issues. For example, on our large database, it took 16 seconds to run the IsNumeric example. But this solution ran instantly. And this solution is compatible with older MySQL versions.

Upvotes: 1

Br&#233;ndal Teixeira
Br&#233;ndal Teixeira

Reputation: 1165

As John Dyer said, you should consider fixing the data in the DB and store only numbers. However, if you are facing the same situation as mine (I cannot run a update query) the workaround I found was combining 2 queries.

The "inside" query will retrieve all the phone numbers and format them removing the non-numeric characters.

SELECT REGEXP_REPLACE(column_name, '[^0-9]', '') phone_formatted FROM table_name

The result of it will be all phone numbers without any special character. After that the "outside" query just need to get the entry you are looking for. The 2 queries will be:

SELECT phone_formatted FROM (
    SELECT REGEXP_REPLACE(column_name, '[^0-9]', '') phone_formatted FROM table_name
) AS result WHERE phone_formatted = 9999999999

Important: the AS result is not used but it should be there to avoid erros.

Upvotes: 4

Grbts
Grbts

Reputation: 131

See

http://www.mfs-erp.org/community/blog/find-phone-number-in-database-format-independent

It is not really an issue that the regular expression would become visually appalling, since only mysql "sees" it. Note that instead of '+' (cfr. post with [\D] from the OP) you should use '*' in the regular expression.

Some users are concerned about performance (non-indexed search), but in a table with 100000 customers, this query, when issued from a user interface returns immediately, without noticeable delay.

Upvotes: 2

Heisenberg
Heisenberg

Reputation: 1508

I would use Google's libPhoneNumber to format a number to E164 format. I would add a second column called "e164_number" to store the e164 formatted number and add an index on it.

Upvotes: 0

crono
crono

Reputation: 3663

This is a problem with MySQL - the regex function can match, but it can't replace. See this post for a possible solution.

Upvotes: 2

Nihal
Nihal

Reputation: 81

I know this is ancient history, but I found it while looking for a similar solution.

A simple REGEXP may work:

select * from phone_table where phone1 REGEXP "07[^0-9]*123[^0-9]*456"

This would match the phonenumber column with or without any separating characters.

Upvotes: 8

Sathish
Sathish

Reputation: 419

Create a user defined function to dynamically creates Regex.

DELIMITER //

CREATE FUNCTION udfn_GetPhoneRegex
(   
    var_Input VARCHAR(25)
)
RETURNS VARCHAR(200)

BEGIN
    DECLARE iterator INT          DEFAULT 1;
    DECLARE phoneregex VARCHAR(200)          DEFAULT '';

    DECLARE output   VARCHAR(25) DEFAULT '';


   WHILE iterator < (LENGTH(var_Input) + 1) DO
      IF SUBSTRING(var_Input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
         SET output = CONCAT(output, SUBSTRING(var_Input, iterator, 1));
      END IF;
      SET iterator = iterator + 1;
   END WHILE;
    SET output = RIGHT(output,10);
    SET iterator = 1;
    WHILE iterator < (LENGTH(output) + 1) DO
         SET phoneregex = CONCAT(phoneregex,'[^0-9]*',SUBSTRING(output, iterator, 1));
         SET iterator = iterator + 1;
    END WHILE;
    SET phoneregex = CONCAT(phoneregex,'$');
   RETURN phoneregex;
END//
DELIMITER ;

Call that User Defined Function in your stored procedure.

DECLARE var_PhoneNumberRegex        VARCHAR(200);
SET var_PhoneNumberRegex = udfn_GetPhoneRegex('+ 123 555 7890');
SELECT * FROM Customer WHERE phonenumber REGEXP var_PhoneNumberRegex;

Upvotes: 0

Michael Bagryantcev
Michael Bagryantcev

Reputation: 31

i suggest to use php functions, and not mysql patterns, so you will have some code like this:

$tmp_phone = '';
for ($i=0; $i < strlen($phone); $i++)
   if (is_numeric($phone[$i]))
       $tmp_phone .= '%'.$phone[$i];
$tmp_phone .= '%';
$search_condition .= " and phone LIKE '" . $tmp_phone . "' ";

Upvotes: 2

steve
steve

Reputation:

a possible solution can be found at http: //udf-regexp.php-baustelle.de/trac/

additional package need to be installed, then you can play with REGEXP_REPLACE

Upvotes: 0

Michael Johnson
Michael Johnson

Reputation: 2307

My solution would be something along the lines of what John Dyer said. I'd add a second column (e.g. phoneStripped) that gets stripped on insert and update. Index this column and search on it (after stripping your search term, of course).

You could also add a trigger to automatically update the column, although I've not worked with triggers. But like you said, it's really difficult to write the MySQL code to strip the strings, so it's probably easier to just do it in your client code.

(I know this is late, but I just started looking around here :)

Upvotes: 2

John Dyer
John Dyer

Reputation: 2358

This looks like a problem from the start. Any kind of searching you do will require a table scan and we all know that's bad.

How about adding a column with a hash of the current phone numbers after stripping out all formatting characters. Then you can at least index the hash values and avoid a full blown table scan.

Or is the amount of data small and not expected to grow much? Then maybe just sucking all the numbers into the client and running a search there.

Upvotes: 14

Tanj
Tanj

Reputation: 1354

if this is something that is going to happen on a regular basis perhaps modifying the data to be all one format and then setup the search form to strip out any non-alphanumeric (if you allow numbers like 310-BELL) would be a good idea. Having data in an easily searched format is half the battle.

Upvotes: 0

Orion Edwards
Orion Edwards

Reputation: 123642

Woe is me. I ended up doing this:

mre = mobile_number && ('%' + mobile_number.gsub(/\D/, '').scan(/./m).join('%'))

find(:first, :conditions => ['trim(mobile_phone) like ?', mre])

Upvotes: 0

crucible
crucible

Reputation: 3129

Just an idea, but couldn't you use Regex to quickly strip out the characters and then compare against that like @Matt Hamilton suggested?

Maybe even set up a view (not sure of mysql on views) that would hold all phone numbers stripped by regex to a plain phone number?

Upvotes: 0

Orion Edwards
Orion Edwards

Reputation: 123642

MySQL can search based on regular expressions.

Sure, but given the arbitrary formatting, if my haystack contained "(027) 123 456" (bear in mind position of spaces can change, it could just as easily be 027 12 3456 and I wanted to match it with 027123456, would my regex therefore need to be this?

"^[\D]+0[\D]+2[\D]+7[\D]+1[\D]+2[\D]+3[\D]+4[\D]+5[\D]+6$"

(actually it'd be worse as the mysql manual doesn't seem to indicate it supports \D)

If that is the case, isn't it more or less the same as my %%%%% idea?

Upvotes: 0

megabytephreak
megabytephreak

Reputation: 598

Is it possible to run a query to reformat the data to match a desired format and then just run a simple query? That way even if the initial reformatting is slow you it doesn't really matter.

Upvotes: 2

Matt Hamilton
Matt Hamilton

Reputation: 204139

An out-of-the-box idea, but could you use a "replace" function to strip out any instances of "(", "-" and " ", and then use an "isnumeric" function to test whether the resulting string is a number?

Then you could do the same to the phone number string you're searching for and compare them as integers.

Of course, this won't work for numbers like 1800-MATT-ROCKS. :)

Upvotes: 2

Related Questions