Reputation: 10294
I am trying to find all the rows from database whose field have non-anchor tags whose href attribute start with {clickurl} string. For example this -
<link foo="bar" href="{clickurl}http://wwww.google.com" ...
Or this (since it has one non-anchor tag matching the criteria) - http://wwww.google.com" ... http://wwww.google.com" ...
But not this (since it is an anchor tag) - http://wwww.google.com" ...
What I have done so far
With the following regexp, I am able to get all records where link tag has href attribute starting with {clickurl} -
SELECT bannerid FROM ox_banners WHERE htmltemplate REGEXP "<link[^>]*href\s*=\s*[\"'][^>]*{clickurl}(.*)[\"']"
But, since I need to search for not only link tags but any other tag (excluding anchor tag), I modified the regexp to -
SELECT bannerid FROM ox_banners WHERE htmltemplate REGEXP "<[!a][^>]*href\s*=\s*[\"'][^>]*{clickurl}(.*)[\"']"
But this is also returning rows where anchor tags contain this pattern.
Updates
With inputs from zx81, I am now using this expression<[^a][^>]*href[[:space:]]*=[[:space:]]*[\"'][^>]*{clickurl}(.*)[\"']
and only non anchor tags are matching in normal case, but in cases like the following, when href attribute is on a tag which is inside an echo statement within a PHP tag, it is also getting matched (not desired) since actually it is an href on an anchor tag -
<?php
$GLOBALS['test'] = '{clickurl}tel://test';
echo '<a href="{clickurl}test">Test</a>';
?>
The
I am still looking for this fix.
Upvotes: 0
Views: 1435
Reputation: 41838
Try this regex instead:
< *[^a][^>]+ *href *= *"{clickurl}
You were almost there. It looks like you had a small typo: you had [!a]
instead of [^a]
to mean "one character that is not an "a".
The [^a]
and [^>]
are almost the same. I am sure you know this, but in both cases, the ^
means "not", so [^>]
is any character that is not a >
Instead of *
if you want to allow not only for space characters but other kinds of whitespace, you can use [[:space:]]*
Thanks to Tuga for reminding me that \s
doesn't work in MySQL: it matches a literal "s". I had "spaced out" on this one. :)
Upvotes: 1
Reputation: 98961
Try this:
SELECT bannerid FROM ox_banners WHERE htmltemplate REGEXP ".*<[^a][^>]*href=\"\\{clickurl\\}.*";
Options: Case insensitive; Regex syntax only
Match any single character that is NOT a line break character (line feed) «.*»
Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
Match the character “<” literally «<»
Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) «[^a]»
The literal character “a” (case insensitive) «a»
Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) «[^>]*»
Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
The literal character “>” «>»
Match the character string “href="” literally (case insensitive) «href="»
Match the character “{” literally «\{»
Match the character string “clickurl” literally (case insensitive) «clickurl»
Match the character “}” literally «\}»
Match any single character that is NOT a line break character (line feed) «.*»
Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
Upvotes: 2