Reputation: 3082
I have a set of characters that are defined as valid characters. Let me define the valid string by the following regexp:
^[a-zA-Z0-9\ .-_]+$
(alphanumeric, space, dot, dash and underscore)
The question is that given a column containing a lot of invalid characters, how I can run an update to convert each invalid character to one space? And then possibly convert consequent spaces to one space?
I cannot run several replace commands because there are a lot of possible invalid characters. So I expect a regexp solution.
Currently, I am doing the task in Java (after exporting the table to tsv format). But I want a MySQL approach.
Upvotes: 0
Views: 394
Reputation: 12389
If your MySQL version supports it, create a function:
DELIMITER $$
CREATE FUNCTION my_func_1 (str TEXT)
RETURNS TEXT
BEGIN
DECLARE ret TEXT DEFAULT '';
DECLARE chr TEXT DEFAULT '';
DECLARE i INT DEFAULT 1;
WHILE i < (LENGTH(str) + 1) DO
SET chr = SUBSTRING(str, i, 1);
IF chr REGEXP '[-a-zA-Z0-9\\_.]'
THEN SET ret = CONCAT(ret, chr);
ELSE
SET ret = CONCAT(ret, ' ');
END IF;
SET i = i + 1;
END WHILE;
WHILE ret LIKE '% %' DO
SET ret = REPLACE(ret, ' ', ' ');
END WHILE;
RETURN TRIM(ret);
END;
$$
DELIMITER ;
Test it a bit:
SELECT my_func_1('$a-B\\?!=01._%');
> a-B\ 01._
and update with SET col = my_func_1(col)
If not needed anymore:
DROP FUNCTION IF EXISTS my_func_1;
Also I changed your regex a bit as -
indicates a range, if between characters in a class or is .-_
intended? Then modify the pattern.
Upvotes: 1