Alisa
Alisa

Reputation: 3082

Convert all characters except a set of characters to whitespace in MySQL

I have a set of characters that are defined as valid characters. Let me define the valid string by the following regexp:

^[a-zA-Z0-9\ .-_]+$

(alphanumeric, space, dot, dash and underscore)

The question is that given a column containing a lot of invalid characters, how I can run an update to convert each invalid character to one space? And then possibly convert consequent spaces to one space?

I cannot run several replace commands because there are a lot of possible invalid characters. So I expect a regexp solution.

Currently, I am doing the task in Java (after exporting the table to tsv format). But I want a MySQL approach.

Upvotes: 0

Views: 394

Answers (1)

Jonny 5
Jonny 5

Reputation: 12389

If your MySQL version supports it, create a function:

DELIMITER $$
CREATE FUNCTION my_func_1 (str TEXT)
  RETURNS TEXT
BEGIN
  DECLARE ret TEXT DEFAULT '';
  DECLARE chr TEXT DEFAULT '';
  DECLARE i INT DEFAULT 1;
  WHILE i < (LENGTH(str) + 1) DO
    SET chr = SUBSTRING(str, i, 1);
    IF chr REGEXP '[-a-zA-Z0-9\\_.]'
      THEN SET ret = CONCAT(ret, chr);
    ELSE
      SET ret = CONCAT(ret, ' ');
    END IF;
    SET i = i + 1;
  END WHILE;
  WHILE ret LIKE '%  %' DO
    SET ret = REPLACE(ret, '  ', ' ');
  END WHILE;
  RETURN TRIM(ret);
END;
$$
DELIMITER ;

Test it a bit:

SELECT my_func_1('$a-B\\?!=01._%'); > a-B\ 01._

and update with SET col = my_func_1(col)


If not needed anymore:

DROP FUNCTION IF EXISTS my_func_1;

Also I changed your regex a bit as - indicates a range, if between characters in a class or is .-_ intended? Then modify the pattern.

Upvotes: 1

Related Questions