Reputation: 3
Is it possible to conditionally replace parts of strings in MySQL?
Introduction to a problem: Users in my database stored articles (table called "table", column "value", each row = one article) with wrong links to images. I'd like to repair all of them at once. To do that, I have to replace all of the addresses in "href" links that are followed by images, i.e.,
<a href="link1"><img src="link2"></a>
should by replaced by
<a href="link2"><img src="link2"></a>
My idea is to search for each "href" tag and if the tag is followed by and "img", than I'd like to obtain "link2" from the image and use it replace "link1".
I know how to do it in bash or python but I do not have enough experience with MySQL.
To be specific, my table contains references to images like
<a href="www.a.cz/b/c"><img class="image image-thumbnail " src="www.d.cz/e/f.jpg" ...
I'd like to replace the first adress (href) by the image link. To get
<a href="www.d.cz/e/f.jpg"><img class="image image-thumbnail " src="www.d.cz/e/f.jpg" ...
Is it possible to make a query (queries?) like
UPDATE `table`
SET value = REPLACE(value, 'www.a.cz/b/c', 'XXX')
WHERE `value` LIKE '%www.a.cz/b/c%'
where XXX differs every time and its value is obtained from the database? Moreover, "www.a.cz/b/c" varies.
To make things complicated, not all of the images have the "href" link and not all of the links refer to images. There are three possibilities:
"href" followed by "img" -> replace
"href" not followed by "img" -> keep original link (probably a link to another page)
"img" without "href" -> do nothing (there is no wrong link to replace)
Of course, some of the images may have a correct link. In this case it may be also replaced (original and new will be the same).
Database info from phpMyAdmin
Software: MariaDB
Software version: 10.1.32-MariaDB - Source distribution
Protocol version: 10
Server charset: UTF-8 Unicode (utf8)
Apache
Database client version: libmysql - 5.6.15
PHP extension: mysqli
Thank you in advance
Upvotes: 0
Views: 783
Reputation: 3
Solved, thanks to @MatBailie , but I had to modified his answer. The final query, including the update, is
UPDATE `table`
SET value = REGEXP_REPLACE(value, '(.*)<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)', '\\1<a href="\\4"><img class="\\3" src="\\4"\\5'
)
A wildcard (.*) had to be put at the beginning of the search because the link is included in an article (long text) and, consequently, the arguments of the replace pattern are increased.
Upvotes: 0
Reputation: 86775
SELECT
regexp_replace(
value,
'^<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)$',
'<a href="\\3"><img class="\\2" src="\\3"\\4'
)
FROM
yourTable
The replacement only happens if the pattern is matched.
^
at the start means start of the string
([^"]+)
means one of more characters, excluding "
(.*)
means zero or more of any character$
at the end means end of the string
The replacement takes the 3rd "pattern enclosed in braces" (back-reference) and puts it where the 1st "pattern enclosed in braces" (back-reference) was.
The 2nd, 3rd and 4th back-references are replaced with themselves (no change).
https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=96aef2214f844a1466772f41415617e5
If you have strings that don't exactly match the pattern, it will do nothing. Extra spaces will trip it up, for example.
In which case you need to work out a new regular expression that always matches all of the strings you want to work on. Then you can use the \\n
back-references to make replacements.
For example, the following deals with extra spaces in the href
tag...
SELECT
regexp_replace(
value,
'^<a[ ]+href[ ]*=[ ]*"([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)$',
'<a href="\\3"><img class="\\2" src="\\3"\\4'
)
FROM
yourTable
EDIT:
Following comments clarifying that these are actually snippets from the MIDDLE of the string...
https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=48ce1cc3df5bf4d3d140025b662072a7
UPDATE
yourTable
SET
value = REGEXP_REPLACE(
value,
'<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"',
'<a href="\\3"><img class="\\2" src="\\3"'
)
WHERE
value REGEXP '<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"'
(Though I prefer the syntax RLIKE
, it's functionally identical.)
This will also find an replace that pattern multiple times. You're not clear if that's desired or possible.
Upvotes: 1