Reputation: 123642
I have a table which is full of arbitrarily formatted phone numbers, like this
027 123 5644
021 393-5593
(07) 123 456
042123456
I need to search for a phone number in a similarly arbitrary format ( e.g. 07123456
should find the entry (07) 123 456
The way I'd do this in a normal programming language is to strip all the non-digit characters out of the 'needle', then go through each number in the haystack, strip all non-digit characters out of it, then compare against the needle, eg (in ruby)
digits_only = lambda{ |n| n.gsub /[^\d]/, '' }
needle = digits_only[input_phone_number]
haystack.map(&digits_only).include?(needle)
The catch is, I need to do this in MySQL. It has a host of string functions, none of which really seem to do what I want.
Currently I can think of 2 'solutions'
CONCAT
and SUBSTR
%
between every character of the needle ( so it's like this: %0%7%1%2%3%4%5%6%
)However, neither of these seem like particularly elegant solutions.
Hopefully someone can help or I might be forced to use the %%%%%% solution
If the dataset grows I'll take the 'phoneStripped' approach. Thanks for all the feedback!
could you use a "replace" function to strip out any instances of "(", "-" and " ",
I'm not concerned about the result being numeric.
The main characters I need to consider are +
, -
, (
, )
and space
So would that solution look like this?
SELECT * FROM people
WHERE
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phonenumber, '('),')'),'-'),' '),'+')
LIKE '123456'
Wouldn't that be terribly slow?
Upvotes: 19
Views: 25240
Reputation: 3712
In my case, I needed to identify Swiss (CH) mobile phone numbers in the phone
column and move them in mobile
column.
As all mobile phone numbers starts with 07x or +417x here is the regex to use :
/^(\+[0-9][0-9]\s*|0|)7.*/mgix
It find all numbers like the following :
and ignore all others like theese :
In MySQL it gives the following code :
UPDATE `contact`
SET `mobile` = `phone`,
`phone` = ''
WHERE `phone` REGEXP '^(\\+[\D+][0-9]\\s*|0|)(7.*)$'
You'll need to clean your number from special chars like
-/.()
before.
https://regex101.com/r/AiWFX8/1
Upvotes: 0
Reputation: 2596
Here is a working Solution for PHP users.
This uses a loop in PHP to build the Regular Expression. Then searches the database in MySQL with the RLIKE operator.
$phone = '(456) 584-5874' // can be any format
$phone = preg_replace('/[^0-9]/', '', $phone); // strip non-numeric characters
$len = strlen($phone); // get length of phone number
for ($i = 0; $i < $len - 1; $i++) {
$regex .= $phone[$i] . "[^[:digit:]]*";
}
$regex .= $phone[$len - 1];
This creates a Regular Expression that looks like this: 4[^[:digit:]]*5[^[:digit:]]*6[^[:digit:]]*5[^[:digit:]]*8[^[:digit:]]*4[^[:digit:]]*5[^[:digit:]]*8[^[:digit:]]*7[^[:digit:]]*4
Now formulate your MySQL something like this:
$sql = "SELECT Client FROM tb_clients WHERE Phone RLIKE '$regex'"
NOTE: I tried several of the other posted answers but found performance issues. For example, on our large database, it took 16 seconds to run the IsNumeric example. But this solution ran instantly. And this solution is compatible with older MySQL versions.
Upvotes: 1
Reputation: 1165
As John Dyer said, you should consider fixing the data in the DB and store only numbers. However, if you are facing the same situation as mine (I cannot run a update query) the workaround I found was combining 2 queries.
The "inside" query will retrieve all the phone numbers and format them removing the non-numeric characters.
SELECT REGEXP_REPLACE(column_name, '[^0-9]', '') phone_formatted FROM table_name
The result of it will be all phone numbers without any special character. After that the "outside" query just need to get the entry you are looking for. The 2 queries will be:
SELECT phone_formatted FROM (
SELECT REGEXP_REPLACE(column_name, '[^0-9]', '') phone_formatted FROM table_name
) AS result WHERE phone_formatted = 9999999999
Important: the AS result is not used but it should be there to avoid erros.
Upvotes: 4
Reputation: 131
See
http://www.mfs-erp.org/community/blog/find-phone-number-in-database-format-independent
It is not really an issue that the regular expression would become visually appalling, since only mysql "sees" it. Note that instead of '+' (cfr. post with [\D] from the OP) you should use '*' in the regular expression.
Some users are concerned about performance (non-indexed search), but in a table with 100000 customers, this query, when issued from a user interface returns immediately, without noticeable delay.
Upvotes: 2
Reputation: 1508
I would use Google's libPhoneNumber to format a number to E164 format. I would add a second column called "e164_number" to store the e164 formatted number and add an index on it.
Upvotes: 0
Reputation: 3663
This is a problem with MySQL - the regex function can match, but it can't replace. See this post for a possible solution.
Upvotes: 2
Reputation: 81
I know this is ancient history, but I found it while looking for a similar solution.
A simple REGEXP may work:
select * from phone_table where phone1 REGEXP "07[^0-9]*123[^0-9]*456"
This would match the phonenumber
column with or without any separating characters.
Upvotes: 8
Reputation: 419
Create a user defined function to dynamically creates Regex.
DELIMITER //
CREATE FUNCTION udfn_GetPhoneRegex
(
var_Input VARCHAR(25)
)
RETURNS VARCHAR(200)
BEGIN
DECLARE iterator INT DEFAULT 1;
DECLARE phoneregex VARCHAR(200) DEFAULT '';
DECLARE output VARCHAR(25) DEFAULT '';
WHILE iterator < (LENGTH(var_Input) + 1) DO
IF SUBSTRING(var_Input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
SET output = CONCAT(output, SUBSTRING(var_Input, iterator, 1));
END IF;
SET iterator = iterator + 1;
END WHILE;
SET output = RIGHT(output,10);
SET iterator = 1;
WHILE iterator < (LENGTH(output) + 1) DO
SET phoneregex = CONCAT(phoneregex,'[^0-9]*',SUBSTRING(output, iterator, 1));
SET iterator = iterator + 1;
END WHILE;
SET phoneregex = CONCAT(phoneregex,'$');
RETURN phoneregex;
END//
DELIMITER ;
Call that User Defined Function in your stored procedure.
DECLARE var_PhoneNumberRegex VARCHAR(200);
SET var_PhoneNumberRegex = udfn_GetPhoneRegex('+ 123 555 7890');
SELECT * FROM Customer WHERE phonenumber REGEXP var_PhoneNumberRegex;
Upvotes: 0
Reputation: 31
i suggest to use php functions, and not mysql patterns, so you will have some code like this:
$tmp_phone = '';
for ($i=0; $i < strlen($phone); $i++)
if (is_numeric($phone[$i]))
$tmp_phone .= '%'.$phone[$i];
$tmp_phone .= '%';
$search_condition .= " and phone LIKE '" . $tmp_phone . "' ";
Upvotes: 2
Reputation:
a possible solution can be found at http: //udf-regexp.php-baustelle.de/trac/
additional package need to be installed, then you can play with REGEXP_REPLACE
Upvotes: 0
Reputation: 2307
My solution would be something along the lines of what John Dyer said. I'd add a second column (e.g. phoneStripped) that gets stripped on insert and update. Index this column and search on it (after stripping your search term, of course).
You could also add a trigger to automatically update the column, although I've not worked with triggers. But like you said, it's really difficult to write the MySQL code to strip the strings, so it's probably easier to just do it in your client code.
(I know this is late, but I just started looking around here :)
Upvotes: 2
Reputation: 2358
This looks like a problem from the start. Any kind of searching you do will require a table scan and we all know that's bad.
How about adding a column with a hash of the current phone numbers after stripping out all formatting characters. Then you can at least index the hash values and avoid a full blown table scan.
Or is the amount of data small and not expected to grow much? Then maybe just sucking all the numbers into the client and running a search there.
Upvotes: 14
Reputation: 1354
if this is something that is going to happen on a regular basis perhaps modifying the data to be all one format and then setup the search form to strip out any non-alphanumeric (if you allow numbers like 310-BELL) would be a good idea. Having data in an easily searched format is half the battle.
Upvotes: 0
Reputation: 123642
Woe is me. I ended up doing this:
mre = mobile_number && ('%' + mobile_number.gsub(/\D/, '').scan(/./m).join('%'))
find(:first, :conditions => ['trim(mobile_phone) like ?', mre])
Upvotes: 0
Reputation: 3129
Just an idea, but couldn't you use Regex to quickly strip out the characters and then compare against that like @Matt Hamilton suggested?
Maybe even set up a view (not sure of mysql on views) that would hold all phone numbers stripped by regex to a plain phone number?
Upvotes: 0
Reputation: 123642
MySQL can search based on regular expressions.
Sure, but given the arbitrary formatting, if my haystack contained "(027) 123 456"
(bear in mind position of spaces can change, it could just as easily be 027 12 3456
and I wanted to match it with 027123456
, would my regex therefore need to be this?
"^[\D]+0[\D]+2[\D]+7[\D]+1[\D]+2[\D]+3[\D]+4[\D]+5[\D]+6$"
(actually it'd be worse as the mysql manual doesn't seem to indicate it supports \D
)
If that is the case, isn't it more or less the same as my %%%%% idea?
Upvotes: 0
Reputation: 598
Is it possible to run a query to reformat the data to match a desired format and then just run a simple query? That way even if the initial reformatting is slow you it doesn't really matter.
Upvotes: 2
Reputation: 204139
An out-of-the-box idea, but could you use a "replace" function to strip out any instances of "(", "-" and " ", and then use an "isnumeric" function to test whether the resulting string is a number?
Then you could do the same to the phone number string you're searching for and compare them as integers.
Of course, this won't work for numbers like 1800-MATT-ROCKS. :)
Upvotes: 2