Reputation: 761
Need help to form the MYSQL query from table one column having the bellow content
Row1 : this is first <a href='mytext.txt'>row</a> from the table
Row 2 : THis is the second row <img src ='mytext.jpg'> my image is there
Row 3 : <p>This is the Third row my mytext is there </p>
Row 4 : <p class='te_mytext'>This is the Third row my text is there </p>
this is the table rows i try to search the keyword as 'mytext'
my query is
SELECT * from table WHERE colmn_name ` like '%mytext%' "
I will get all the 4 rows as result but the result is wrong. I need to get the correct output as only Row 3. The reason this row only having the mytext inside the content all other are not in content but mytext having in all rows
How can I write the MySQL query?
Upvotes: 34
Views: 106091
Reputation: 479
I think there is no need for such extraordinary logic. It can be simplified like...
-- set @StrRow = 'this is first <a href="mytext.txt">row</a> from the table';
-- set @StrRow = 'THis is the second row <img src ="mytext.jpg"> my image is there';
set @StrRow = '<p>This is the Third row my mytext is there </p>';
-- set @StrRow = '<p class="te_mytext">This is the Third row my text is there </p>';
set @MyText = 'mytext';
select locate('<', @StrRow, locate(@MyText, @StrRow)) as '<', locate(@MyText, @StrRow) as MyText, locate('>', @StrRow, locate(@MyText, @StrRow)) as '>'
from xyz
where
locate('<', @StrRow, locate(@MyText, @StrRow)) > 0 and
locate('<', @StrRow, locate(@MyText, @StrRow)) < locate('>', @StrRow, locate(@MyText, @StrRow))
Upvotes: 2
Reputation: 21
Expanded the strip_tags function as provided by Boann. It now can be used to either keep or ignore the phrase between the tags. Note the bug with empty tags in my part of the code, ie. for $keep_phrase = false.
CREATE FUNCTION strip_tags($str text, $tag text,$keep_phrase bool) RETURNS text
BEGIN
DECLARE $start, $end INT DEFAULT 1;
SET $str = COALESCE($str, '');
LOOP
SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
IF (!$start) THEN RETURN $str; END IF;
IF ($keep_phrase) THEN
SET $end = LOCATE('>', $str, $start);
IF (!$end) THEN SET $end = $start; END IF;
SET $str = INSERT($str, $start, $end - $start + 1, '');
SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
ELSE
SET $end = LOCATE(CONCAT('</', $tag, '>'),$str,$start);
IF (!$end) THEN
SET $end = LOCATE('/>',$str,$start);
SET $str = INSERT($str, $start, $end - $start + 2, '');
ELSE
SET $str = INSERT($str, $start, $end - $start
+ LENGTH(CONCAT('</', $tag, '>')), '');
END IF;
END IF;
END LOOP;
END //
To prove the functionality:
SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><p>so<span id="y"> very</span> cool</p>','span',true);
<p>so very cool</p><p>so very cool</p>
SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><p>so<span id="y"> very</span> cool</p>','span',false);
<p>so cool</p><p>so cool</p>
Empty elements with $keep_phrase = false is not supported, see:
SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><span/><p>so<span id="y"> very</span> cool</p>','span',false);
<p>so cool</p> cool</p>
Upvotes: 2
Reputation:
Regular expression matching is needed here, with negative look-ahead assertion: "mytext" not followed by a closing tag. MySQL still, as of 8.0, doesn't support look assertions, but MariaDB does. MariaDB query for the question:
SELECT * FROM table WHERE column_name REGEXP 'mytext(?![^<>]*>)';
Another solution for the question is to remove some / all tags before matching. It's inefficient, compared with REGEXP, but also works. In MySQL starting with 8.0 and MariaDB starting with 10.0.5, there is built-in REGEXP_REPLACE
function. 'strip_html' is even the first example in MariaDB corresponding documentation page. MySQL / MariaDB query for such approach:
SELECT * FROM table WHERE REGEXP_REPLACE (column_name, '<.+?>', '') LIKE '%mytext%';
Apart from that, strings in the question mix data and representation. They shouldn't be regularly searched as it is a waste of system resources.
Upvotes: 7
Reputation: 1492
MySQL strip tags implementation, allowing you to target a specific tag so that we can replace out the tags one by one with each function call. You just need pass the tag parameter, e.g. 'a'
to replace out all opening/closing anchor tags.
# MySQL function to programmatically replace out specified html tags from text/html fields
# run this to drop/update the stored function
DROP FUNCTION IF EXISTS `strip_tags`;
DELIMITER |
# function to nuke all opening and closing tags of type specified in argument 2
CREATE FUNCTION `strip_tags`($str text, $tag text) RETURNS text
BEGIN
DECLARE $start, $end INT DEFAULT 1;
SET $str = COALESCE($str, '');
LOOP
SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
IF (!$start) THEN RETURN $str; END IF;
SET $end = LOCATE('>', $str, $start);
IF (!$end) THEN SET $end = $start; END IF;
SET $str = INSERT($str, $start, $end - $start + 1, '');
SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
END LOOP;
END;
| DELIMITER ;
# test select to nuke all opening <a> tags
SELECT
STRIP_TAGS(description, 'a') AS stripped
FROM
tmpcat;
# run update query to replace out all <a> tags
UPDATE tmpcat
SET
description = STRIP_TAGS(description, 'a');
Upvotes: 2
Reputation: 931
I used strip_tags() above (Thanks Boann) with a slight mod to be rid of the html codes of accented chars etc. Like so:
...BEGIN
DECLARE $start, $end INT DEFAULT 1;
SET $str = REPLACE($str, " ", " ");
SET $str = REPLACE($str, "€", "€");
SET $str = REPLACE($str, "á", "á");
SET $str = REPLACE($str, "é", "é");
SET $str = REPLACE($str, "í", "í");
SET $str = REPLACE($str, "ó", "ó");
SET $str = REPLACE($str, "ú", "ú");
LOOP...
Upvotes: 2
Reputation: 31
Add these lines to fnStripTags
function
after SET Dirty = Insert( Dirty, iStart, iLength, '');
set Dirty = Replace(Dirty,' ',''); #No space between & and nbsp;
set Dirty = Replace(Dirty,'\r','');
set Dirty = Replace(Dirty,'\n','');
Upvotes: 3
Reputation: 50010
Here's my implementation of a strip_tags function:
CREATE FUNCTION `strip_tags`($str text) RETURNS text
BEGIN
DECLARE $start, $end INT DEFAULT 1;
LOOP
SET $start = LOCATE("<", $str, $start);
IF (!$start) THEN RETURN $str; END IF;
SET $end = LOCATE(">", $str, $start);
IF (!$end) THEN SET $end = $start; END IF;
SET $str = INSERT($str, $start, $end - $start + 1, "");
END LOOP;
END;
I made sure it removes mismatched opening brackets because they're dangerous, though it ignores any unpaired closing brackets because they're harmless.
mysql> select strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.');
+----------------------------------------------------------------------+
| strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.') |
+----------------------------------------------------------------------+
| hello world again. |
+----------------------------------------------------------------------+
1 row in set
Enjoy.
Upvotes: 30
Reputation: 331
If your content always start with tags(<body> etc.)
try this:
SELECT * from table WHERE colmn_name REGEXP '>[^<]*mytext';
Upvotes: 14
Reputation: 3657
try this solution: not tried it myself but apparently it works.
source: http://forums.mysql.com/read.php?52,177343,177985#msg-177985
SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS fnStripTags;
DELIMITER |
CREATE FUNCTION fnStripTags( Dirty varchar(4000) )
RETURNS varchar(4000)
DETERMINISTIC
BEGIN
DECLARE iStart, iEnd, iLength int;
WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO
BEGIN
SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));
SET iLength = ( iEnd - iStart) + 1;
IF iLength > 0 THEN
BEGIN
SET Dirty = Insert( Dirty, iStart, iLength, '');
END;
END IF;
END;
END WHILE;
RETURN Dirty;
END;
|
DELIMITER ;
SELECT fnStripTags('this <html>is <b>a test</b>, nothing more</html>');
Upvotes: 55
Reputation: 17981
You cannot parse HTML inside of SQL queries, that doesn't make any sense. Perhaps you could just maintain a special search version of the table with all the HTML stripped, but you would have to use some external processing to do that.
Upvotes: -3