faq
faq

Reputation: 3076

What is the MySQL query equivalent of PHP strip_tags?

I have a large database which contains records that have <a> tags in them and I would like to remove them. Of course there is the method where I create a PHP script that selects all, uses strip_tags and updates the database, but this takes a long time. So how can I do this with a simple (or complicated) MySQL query?

Upvotes: 30

Views: 49039

Answers (9)

Marco Marsala
Marco Marsala

Reputation: 2462

MySQL >= 5.5 provides XML functions to solve your issue:

SELECT ExtractValue(field, '//text()') FROM table;

Reference: https://dev.mysql.com/doc/refman/5.5/en/xml-functions.html

Upvotes: 43

Gene Kelly
Gene Kelly

Reputation: 199

Compatible with MySQL 8+ and MariaDB 10.0.5+

SELECT REGEXP_REPLACE(body, '<[^>]*>+', '') FROM app_cms_sections

Upvotes: 3

ajmedway
ajmedway

Reputation: 1492

I just extended the answer @boann to allow targetting of any specific tag so that we can replace out the tags one by one with each function call. You just need pass the tag parameter, e.g. 'a' to replace out all opening/closing anchor tags. This answers the question asked by OP, unlike the accepted answer, which strips out ALL tags.

# MySQL function to programmatically replace out specified html tags from text/html fields

# run this to drop/update the stored function
DROP FUNCTION IF EXISTS `strip_tags`;

DELIMITER |

# function to nuke all opening and closing tags of type specified in argument 2
CREATE FUNCTION `strip_tags`($str text, $tag text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    SET $str = COALESCE($str, '');
    LOOP
        SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE('>', $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, '');
        SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
    END LOOP;
END;

| DELIMITER ;

# test select to nuke all opening <a> tags
SELECT 
    STRIP_TAGS(description, 'a') AS stripped
FROM
    tmpcat;

# run update query to replace out all <a> tags
UPDATE tmpcat
SET 
    description = STRIP_TAGS(description, 'a');

Upvotes: 2

You Old Fool
You Old Fool

Reputation: 22941

I'm using the lib_mysqludf_preg library for this and a regex like this:

SELECT PREG_REPLACE('#<[^>]+>#',' ',cell) FROM table;

Also did it like this for rows which with encoded html entities:

SELECT PREG_REPLACE('#&lt;.+?&gt;#',' ',cell) FROM table;

There are probably cases where these might fail but I haven't encountered any and they're reasonably fast.

Upvotes: 1

Scott2B
Scott2B

Reputation: 59

Boann's works once I added SET $str = COALESCE($str, '');.

from this post:

Also to note, you may want to put a SET $str = COALESCE($str, ''); just before the loop otherwise null values may cause a crash/never ending query. – Tom C Aug 17 at 9:51

Upvotes: 1

phenicie
phenicie

Reputation: 739

I am passing this code on, seems very similar to the above. Worked for me, hope it helps.

BEGIN
  DECLARE iStart, iEnd, iLength   INT;

  WHILE locate('<', Dirty) > 0 AND locate('>', Dirty, locate('<', Dirty)) > 0
  DO
    BEGIN
      SET iStart = locate('<', Dirty), iEnd = locate('>', Dirty, locate('<', Dirty));
      SET iLength = (iEnd - iStart) + 1;
      IF iLength > 0 THEN
        BEGIN
          SET Dirty = insert(Dirty, iStart, iLength, '');
        END;
      END IF;
    END;
  END WHILE;
  RETURN Dirty;
END

Upvotes: 6

Boann
Boann

Reputation: 50021

Here you go:

CREATE FUNCTION `strip_tags`($str text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    LOOP
        SET $start = LOCATE("<", $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE(">", $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, "");
    END LOOP;
END;

I made sure it removes mismatched opening brackets because they're dangerous, though it ignores any unpaired closing brackets because they're harmless.

mysql> select strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.');
+----------------------------------------------------------------------+
| strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.') |
+----------------------------------------------------------------------+
| hello world again.                                                   |
+----------------------------------------------------------------------+
1 row in set

Upvotes: 26

Foxinni
Foxinni

Reputation: 4000

REPLACE() works pretty well.

The subtle approach:

 REPLACE(REPLACE(node.body,'<p>',''),'</p>','') as `post_content`

...and the not so subtle: (Converting strings into slugs)

 LOWER(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(node.title), ':', ''), 'é', 'e'), ')', ''), '(', ''), ',', ''), '\\', ''), '\/', ''), '\"', ''), '?', ''), '\'', ''), '&', ''), '!', ''), '.', ''), '–', ''), ' ', '-'), '--', '-'), '--', '-'), '’', '')) as `post_name`

Upvotes: -2

user149341
user149341

Reputation:

I don't believe there's any efficient way to do this in MySQL alone.

MySQL does have a REPLACE() function, but it can only replace constant strings, not patterns. You could possibly write a MySQL stored function to search for and replace tags, but at that point you're probably better off writing a PHP script to do the job. It might not be quite as fast, but it will probably be faster to write.

Upvotes: 6

Related Questions