Sandeepan Nath
Sandeepan Nath

Reputation: 10294

mysql regexp for matching fields containing non-anchor tags with href attribute containing a pattern

I am trying to find all the rows from database whose field have non-anchor tags whose href attribute start with {clickurl} string. For example this -

<link foo="bar" href="{clickurl}http://wwww.google.com" ...

Or this (since it has one non-anchor tag matching the criteria) - http://wwww.google.com" ... http://wwww.google.com" ...

But not this (since it is an anchor tag) - http://wwww.google.com" ...

What I have done so far

With the following regexp, I am able to get all records where link tag has href attribute starting with {clickurl} -

SELECT bannerid FROM ox_banners WHERE htmltemplate REGEXP "<link[^>]*href\s*=\s*[\"'][^>]*{clickurl}(.*)[\"']"

But, since I need to search for not only link tags but any other tag (excluding anchor tag), I modified the regexp to -

SELECT bannerid FROM ox_banners WHERE htmltemplate REGEXP "<[!a][^>]*href\s*=\s*[\"'][^>]*{clickurl}(.*)[\"']"

But this is also returning rows where anchor tags contain this pattern.

Updates

With inputs from zx81, I am now using this expression<[^a][^>]*href[[:space:]]*=[[:space:]]*[\"'][^>]*{clickurl}(.*)[\"'] and only non anchor tags are matching in normal case, but in cases like the following, when href attribute is on a tag which is inside an echo statement within a PHP tag, it is also getting matched (not desired) since actually it is an href on an anchor tag -

<?php

$GLOBALS['test'] = '{clickurl}tel://test';

echo '<a href="{clickurl}test">Test</a>';

?>

The

I am still looking for this fix.

Upvotes: 0

Views: 1435

Answers (2)

zx81
zx81

Reputation: 41838

Try this regex instead:

< *[^a][^>]+ *href *= *"{clickurl}

You were almost there. It looks like you had a small typo: you had [!a] instead of [^a] to mean "one character that is not an "a".

The [^a] and [^>] are almost the same. I am sure you know this, but in both cases, the ^ means "not", so [^>] is any character that is not a >

Instead of * if you want to allow not only for space characters but other kinds of whitespace, you can use [[:space:]]*

Thanks to Tuga for reminding me that \s doesn't work in MySQL: it matches a literal "s". I had "spaced out" on this one. :)

Upvotes: 1

Pedro Lobito
Pedro Lobito

Reputation: 98961

Try this:

SELECT bannerid FROM ox_banners WHERE htmltemplate REGEXP ".*<[^a][^>]*href=\"\\{clickurl\\}.*";


Options: Case insensitive; Regex syntax only
Match any single character that is NOT a line break character (line feed) «.*»
   Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
Match the character “<” literally «<»
Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) «[^a]»
   The literal character “a” (case insensitive) «a»
Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) «[^>]*»
   Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
   The literal character “>” «>»
Match the character string “href="” literally (case insensitive) «href="»
Match the character “{” literally «\{»
Match the character string “clickurl” literally (case insensitive) «clickurl»
Match the character “}” literally «\}»
Match any single character that is NOT a line break character (line feed) «.*»
   Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»

Upvotes: 2

Related Questions