Reputation: 1098
I have very large database to cleanup. On the old system user was permitted to use HTML (and paste from Word files). Now I've large amount of open tags in different places in DB.
Anyone know a application/script to perform this kind of cleanup?
Upvotes: 3
Views: 2499
Reputation: 315
What if you have < and > characters as a part of the text, not a part of HTML. Even if your string contains broken HTML tag at the end of the string. So then you should use this version of the function:
SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS fnStripTags;
DELIMITER |
CREATE FUNCTION fnStripTags( Dirty varchar(4096) )
RETURNS varchar(4096)
DETERMINISTIC
BEGIN
DECLARE iStart, iEnd, iLength, DirtyLength, DirtyTail int;
DECLARE iNextLessThenAngleBracket, iLengthToNextLessThenAngleBracket int;
WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO
BEGIN
SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));
SET iNextLessThenAngleBracket = Locate( '<', Dirty, Locate('<', Dirty ) + 1);
SET iLength = ( iEnd - iStart) + 1;
SET iLengthToNextLessThenAngleBracket = ( iNextLessThenAngleBracket - iStart) + 1;
IF iLength < iLengthToNextLessThenAngleBracket THEN
BEGIN
IF iLength > 0 THEN
BEGIN
SET Dirty = Insert( Dirty, iStart, iLength, '');
END;
END IF;
END;
ELSE
BEGIN
IF iNextLessThenAngleBracket != 0 THEN
BEGIN
SET Dirty = Insert( Dirty, iStart, 1, '*/*');
END;
ELSE
BEGIN
SET Dirty = Insert( Dirty, iStart, iLength, '');
END;
END IF;
END;
END IF;
END;
END WHILE;
IF Locate( '<', Dirty ) > 0 THEN
BEGIN
SET DirtyLength = CHAR_LENGTH(Dirty);
SET DirtyTail = DirtyLength - Locate( '<', Dirty ) + 1;
SET Dirty = Insert( Dirty, Locate( '<', Dirty ), DirtyTail, '');
END;
END IF;
BEGIN
SET Dirty = REPLACE(Dirty, '*/*', '<');
END;
RETURN Dirty;
END;
|
DELIMITER ;
Upvotes: 0
Reputation: 76
You may do this with some bits of php (or another backend language)
Here is an example written on php
$link = mysql_connect($host, $username, $password); //connecting to database
mysql_select_db($database_name,$link); // select a database
$q = mysql_query('SELECT id_row,cell_with_tags FROM tablename WHERE some_condition_if_it_needed');
if(mysql_num_rows($q) !== 0){ //check if result of our query not empty
while($res = mysql_fetch_assoc($q)){ // gets all rows-cells as associative array
$result[] = array('id_row'=>$res['id_row'],
'cell_with_tags'=>strip_tags($res['cell_with_tags']);
}
}
if(!empty($result)){
foreach($result as $k=>&$v){ // update our database. Write there values without tags
mysql_query('UPDATE tablename SET cell_with_tags = '.$result[$k]['cell_with_tags'].' WHERE id='.$result[$k]['id_row']);
}
}
You also can modify call of strip_tags function, if you want to remove not all html tags. (http://ru.php.net/manual/en/function.strip-tags.php)
If executing of php script is not possible, you can do this only with mysql. I get it from this
DROP FUNCTION IF EXISTS strip_tags ||
CREATE FUNCTION strip_tags( x longtext) RETURNS longtext
LANGUAGE SQL NOT DETERMINISTIC READS SQL DATA
BEGIN
DECLARE sstart INT UNSIGNED;
DECLARE ends INT UNSIGNED;
SET sstart = LOCATE('<', x, 1);
REPEAT
SET ends = LOCATE('>', x, sstart);
SET x = CONCAT(SUBSTRING( x, 1 ,sstart -1) ,SUBSTRING(x, ends +1 )) ;
SET sstart = LOCATE('<', x, 1);
UNTIL sstart < 1 END REPEAT;
return x;
END;
||
delimiter ;
mysql> SELECT strip_tags('<a href="HelloWorld.html"><B>Hi, mate!</B></a>') as strip_tags;
Upvotes: 2