Reputation: 15111
I'm looking to find records in a table that match a specific number that the user enters. So, the user may enter 12345, but this could be 123zz4-5 in the database.
I imagine something like this would work, if PHP functions worked in MySQL.
SELECT * FROM foo WHERE preg_replace("/[^0-9]/","",bar) = '12345'
What's the equivalent function or way to do this with just MySQL?
Speed is not important.
Upvotes: 61
Views: 85608
Reputation: 3429
This blog post details how to strip non-numeric characters from a string via a MySQL function:
SELECT NumericOnly("asdf11asf");
returns 11
Upvotes: 3
Reputation: 3111
try this example. this is used for phone numbers, however you can modify it for your needs.
-- function removes non numberic characters from input
-- returne only the numbers in the string
CREATE DEFINER =`root`@`localhost` FUNCTION `remove_alpha`(inputPhoneNumber VARCHAR(50))
RETURNS VARCHAR(50)
CHARSET latin1
DETERMINISTIC
BEGIN
DECLARE inputLenght INT DEFAULT 0;
-- var for our iteration
DECLARE counter INT DEFAULT 1;
-- if null is passed, we still return an tempty string
DECLARE sanitizedText VARCHAR(50) DEFAULT '';
-- holder of each character during the iteration
DECLARE oneChar VARCHAR(1) DEFAULT '';
-- we'll process only if it is not null.
IF NOT ISNULL(inputPhoneNumber)
THEN
SET inputLenght = LENGTH(inputPhoneNumber);
WHILE counter <= inputLenght DO
SET oneChar = SUBSTRING(inputPhoneNumber, counter, 1);
IF (oneChar REGEXP ('^[0-9]+$'))
THEN
SET sanitizedText = Concat(sanitizedText, oneChar);
END IF;
SET counter = counter + 1;
END WHILE;
END IF;
RETURN sanitizedText;
END
To use this user defined function (UDF). Let's say you have a column of phone numbers:
col1
(513)983-3983
1-838-338-9898
phone983-889-8383
select remove_alpha(col1) from mytable
The result would be;
5139833983
18383389898
9838898383
Upvotes: 0
Reputation: 688
You can easily do what you want with REGEXP_REPLACE
(compatible with MySQL 8+ and MariaDB 10.0.5+)
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.
Try it:
SELECT REGEXP_REPLACE('123asd12333', '[^0-9]+', '');
Output:
12312333
Upvotes: 34
Reputation: 1
It works for me:
SELECT * FROM foo WHERE REGEXP_REPLACE(bar, '[^0-9]', '') like '%12345%'
It finds a string with content such as: lfkfk-1kjnb23km4,5
Upvotes: -1
Reputation: 85
If you are on MySQL 5.7 or below, and you just need something quick and dirty without having to define a new function, and you have a small number of known non-numerics to filter out, something like this can work well...
Select replace(replace(replace(replace(replace(phone, '+1', ''), '(', ''), ')', ''), '-', ''), ' ', '') from customers;
Upvotes: 2
Reputation: 27
Being someone that has version 5.7, doesn't have the privilege to create functions, and it's not practical to bring my data down into my code, I found Nelson Miranda's Answer amazing. I wanted to share it in a subquery version which I found more useful.
DROP TABLE IF EXISTS ints;
CREATE TABLE ints (i INT UNSIGNED NOT NULL PRIMARY KEY);
INSERT INTO ints (i) VALUES
( 1), ( 2), ( 3), ( 4), ( 5), ( 6), ( 7), ( 8), ( 9), (10),
(11), (12), (13), (14), (15), (16), (17), (18), (19), (20);
SELECT * FROM foo f
WHERE (SELECT GROUP_CONCAT(SUBSTRING(f.bar, i, 1) ORDER BY i SEPARATOR '')
FROM ints
WHERE i BETWEEN 1 AND LENGTH(f.bar)
AND SUBSTRING(f.bar, i, 1) IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')) = '12345'
Note: The ints table can be a temporary table.
Upvotes: 0
Reputation: 1797
I think you don't need complicated functions for that.
I've found a REGEXP_REPLACE
solution using built-in mysql
character class names. You can read about them in a table in the docs. Basically, they are mysql-specific names for commonly matched groups of characters like [:alnum:]
for alpha-numeric characters, [:alpha:]
for only alphabetic characters and so on.
So my version of the REGEXP_REPLACE
:
REGEXP_REPLACE('My number is: +59 (29) 889-23-56', '[[:alpha:][:blank:][:punct:][:cntrl:]]', '')
will yield 59298892356
as per requirements.
Upvotes: 0
Reputation: 3592
Updated 2022: as per Marlom's answer, you can now use REGEX_REPLACE
- which will perform even better than my historical answer here.
Most upvoted answer above isn't the fastest.
Full kudos to them for giving a working proposal to bounce off!
This is an improved version:
DELIMITER ;;
DROP FUNCTION IF EXISTS `STRIP_NON_DIGIT`;;
CREATE DEFINER=`root`@`localhost` FUNCTION `STRIP_NON_DIGIT`(input VARCHAR(255)) RETURNS VARCHAR(255) CHARSET utf8
READS SQL DATA
BEGIN
DECLARE output VARCHAR(255) DEFAULT '';
DECLARE iterator INT DEFAULT 1;
DECLARE lastDigit INT DEFAULT 1;
DECLARE len INT;
SET len = LENGTH(input) + 1;
WHILE iterator < len DO
-- skip past all digits
SET lastDigit = iterator;
WHILE ORD(SUBSTRING(input, iterator, 1)) BETWEEN 48 AND 57 AND iterator < len DO
SET iterator = iterator + 1;
END WHILE;
IF iterator != lastDigit THEN
SET output = CONCAT(output, SUBSTRING(input, lastDigit, iterator - lastDigit));
END IF;
WHILE ORD(SUBSTRING(input, iterator, 1)) NOT BETWEEN 48 AND 57 AND iterator < len DO
SET iterator = iterator + 1;
END WHILE;
END WHILE;
RETURN output;
END;;
Testing 5000 times on a test server:
-- original
Execution Time : 7.389 sec
Execution Time : 7.257 sec
Execution Time : 7.506 sec
-- ORD between not string IN
Execution Time : 4.031 sec
-- With less substrings
Execution Time : 3.243 sec
Execution Time : 3.415 sec
Execution Time : 2.848 sec
Upvotes: 23
Reputation: 1
To search for numbers that match a particular numeric pattern in a string, first remove all the alphabets and special characters in a similar manner as below then convert the value to an integer and then search
SELECT *
FROM foo
WHERE Convert(Regexp_replace(bar, '[a-zA-Z]+', ''), signed) = 12345
Upvotes: 0
Reputation: 932
On MySQL 8.0+ there is a new native function called REGEXP_REPLACE. A clean solution to this question would be:
SELECT * FROM foo WHERE REGEXP_REPLACE(bar,'[^0-9]+',"") = '12345'
Upvotes: 9
Reputation: 1
thought I would share this since I built it off the function from here. I rearranged just so I can read it easier (I'm just server side).
You call it by passing in a table name and column name to have it strip all existing non-numeric characters from that column. I inherited a lot of bad table structures that put a ton of int fields as varchar so I needed a way to clean these up quickly before I can modify the column to an integer.
drop procedure if exists strip_non_numeric_characters;
DELIMITER ;;
CREATE PROCEDURE `strip_non_numeric_characters`(
tablename varchar(100)
,columnname varchar(100)
)
BEGIN
-- =============================================
-- Author: <Author,,David Melton>
-- Create date: <Create Date,,2/26/2019>
-- Description: <Description,,loops through data and strips out the bad characters in whatever table and column you pass it>
-- =============================================
#this idea was generated from the idea STRIP_NON_DIGIT function
#https://stackoverflow.com/questions/287105/mysql-strip-non-numeric-characters-to-compare
declare input,output varchar(255);
declare iterator,lastDigit,len,counter int;
declare date_updated varchar(100);
select column_name
into date_updated
from information_schema.columns
where table_schema = database()
and extra rlike 'on update CURRENT_TIMESTAMP'
and table_name = tablename
limit 1;
#only goes up to 255 so people don't run this for a longtext field
#just to be careful, i've excluded columns that are part of keys, that could potentially mess something else up
set @find_column_length =
concat("select character_maximum_length
into @len
from information_schema.columns
where table_schema = '",database(),"'
and column_name = '",columnname,"'
and table_name = '",tablename,"'
and length(ifnull(character_maximum_length,100)) < 255
and data_type in ('char','varchar')
and column_key = '';");
prepare stmt from @find_column_length;
execute stmt;
deallocate prepare stmt;
set counter = 1;
set len = @len;
while counter <= ifnull(len,1) DO
#this just removes it by putting all the characters before and after the character i'm looking at
#you have to start at the end of the field otherwise the lengths don't stay in order and you have to run it multiple times
set @update_query =
concat("update `",tablename,"`
set `",columnname,"` = concat(substring(`",columnname,"`,1,",len - counter,"),SUBSTRING(`",columnname,"`,",len - counter,",",counter - 1,"))
",if(date_updated is not null,concat(",`",date_updated,"` = `",date_updated,"`
"),''),
"where SUBSTRING(`",columnname,"`,",len - counter,", 1) not REGEXP '^[0-9]+$';");
prepare stmt from @update_query;
execute stmt;
deallocate prepare stmt;
set counter = counter + 1;
end while;
END ;;
DELIMITER ;
Upvotes: 0
Reputation: 1378
I came across this solution. The top answer by user1467716 will work in phpMyAdmin with a small change: add a second delimiter tag to the end of the code.
phpMyAdmin version is 4.1.14; MySQL version 5.6.20
I also added a length limiter using
DECLARE count INT DEFAULT 0;
in the declarations
AND count < 5
in the WHILE
statement
SET COUNT=COUNT+1;
in the IF
statement
Final form:
DROP FUNCTION IF EXISTS STRIP_NON_DIGIT;
DELIMITER $$
CREATE FUNCTION STRIP_NON_DIGIT(input VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE output VARCHAR(255) DEFAULT '';
DECLARE iterator INT DEFAULT 1;
DECLARE count INT DEFAULT 0;
WHILE iterator < (LENGTH(input) + 1) AND count < 5 DO --limits to 5 chars
IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
SET COUNT=COUNT+1;
END IF;
SET iterator = iterator + 1;
END WHILE;
RETURN output;
END
$$
DELIMITER $$ --added this
Upvotes: 1
Reputation: 481
I realise that this is an ancient topic but upon googling this problem I couldn't find a simple solution (I saw the venerable agents but think this is a simpler solution) so here's a function I wrote, seems to work quite well.
DROP FUNCTION IF EXISTS STRIP_NON_DIGIT;
DELIMITER $$
CREATE FUNCTION STRIP_NON_DIGIT(input VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE output VARCHAR(255) DEFAULT '';
DECLARE iterator INT DEFAULT 1;
WHILE iterator < (LENGTH(input) + 1) DO
IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
END IF;
SET iterator = iterator + 1;
END WHILE;
RETURN output;
END
$$
Upvotes: 48
Reputation: 11
I have a similar situation, matching products to barcodes where the barcode doesn't store none alpha numerics sometimes, so 102.2234 in the DB needs to be found when searching for 1022234.
In the end I just added a new field, reference_number to the products tables, and have php strip out the none alpha numerics in the product_number to populate reference_number whenever a new products is added.
You'd need to do a one time scan of the table to create all the reference_number fields for existing products.
You can then setup your index, even if speed is not a factor for this operation, it is still a good idea to keep the database running well so this query doesn't bog it down and slow down other queries.
Upvotes: 1
Reputation: 536349
There's no regexp replace, only a plain string REPLACE().
MySQL has the REGEXP operator, but it's only a match tester not a replacer, so you would have to turn the logic inside-out:
SELECT * FROM foo WHERE bar REGEXP '[^0-9]*1[^0-9]*2[^0-9]*3[^0-9]*4[^0-9]*5[^0-9]*';
This is like your version with LIKE but matches more accurately. Both will perform equally badly, needing a full table scan without indexes.
Upvotes: 7
Reputation: 7885
How big is table with foo? If it is small, and speed really doesn't matter, you might pull the row ID and foo, loop over it using the PHP replace functions to compare, and then pull the info you want by row number.
Of course, if the table is too big, this won't work well.
Upvotes: 0
Reputation: 5554
There's no regex replace as far as I'm concerned, but I found this solution;
--Create a table with numbers
DROP TABLE IF EXISTS ints;
CREATE TABLE ints (i INT UNSIGNED NOT NULL PRIMARY KEY);
INSERT INTO ints (i) VALUES
( 1), ( 2), ( 3), ( 4), ( 5), ( 6), ( 7), ( 8), ( 9), (10),
(11), (12), (13), (14), (15), (16), (17), (18), (19), (20);
--Then extract the numbers from the specified column
SELECT
bar,
GROUP_CONCAT(SUBSTRING(bar, i, 1) ORDER BY i SEPARATOR '')
FROM foo
JOIN ints ON i BETWEEN 1 AND LENGTH(bar)
WHERE
SUBSTRING(bar, i, 1) IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
GROUP BY bar;
It works for me and I use MySQL 5.0
Also I found this place that could help.
Upvotes: 2
Reputation: 15111
While it's not pretty and it shows results that don't match, this helps:
SELECT * FROM foo WHERE bar LIKE = '%1%2%3%4%5%'
I would still like to find a better solution similar to the item in the original question.
Upvotes: 7
Reputation: 138012
The simplest way I can think to do it is to use the MySQL REGEXP operator a la:
WHERE foo LIKE '1\D*2\D*3\D*4\D*5'
It's not especially pretty but MySQL doesn't have a preg_replace
function so I think it's the best you're going to get.
Personally, if this only-numeric data is so important, I'd keep a separate field just to contain the stripped data. It'll make your lookups a lot faster than with the regular expression search.
Upvotes: 5