Mike.sof
Mike.sof

Reputation: 3

Conditional SQL replace

Is it possible to conditionally replace parts of strings in MySQL?

Introduction to a problem: Users in my database stored articles (table called "table", column "value", each row = one article) with wrong links to images. I'd like to repair all of them at once. To do that, I have to replace all of the addresses in "href" links that are followed by images, i.e., <a href="link1"><img src="link2"></a> should by replaced by <a href="link2"><img src="link2"></a> My idea is to search for each "href" tag and if the tag is followed by and "img", than I'd like to obtain "link2" from the image and use it replace "link1". I know how to do it in bash or python but I do not have enough experience with MySQL.

To be specific, my table contains references to images like

<a href="www.a.cz/b/c"><img class="image image-thumbnail " src="www.d.cz/e/f.jpg" ...

I'd like to replace the first adress (href) by the image link. To get

<a href="www.d.cz/e/f.jpg"><img class="image image-thumbnail " src="www.d.cz/e/f.jpg" ...

Is it possible to make a query (queries?) like

UPDATE `table` 
SET value = REPLACE(value, 'www.a.cz/b/c', 'XXX') 
WHERE `value` LIKE '%www.a.cz/b/c%'

where XXX differs every time and its value is obtained from the database? Moreover, "www.a.cz/b/c" varies.

To make things complicated, not all of the images have the "href" link and not all of the links refer to images. There are three possibilities:

  1. "href" followed by "img" -> replace

  2. "href" not followed by "img" -> keep original link (probably a link to another page)

  3. "img" without "href" -> do nothing (there is no wrong link to replace)

Of course, some of the images may have a correct link. In this case it may be also replaced (original and new will be the same).

Database info from phpMyAdmin

Software: MariaDB

Software version: 10.1.32-MariaDB - Source distribution

Protocol version: 10

Server charset: UTF-8 Unicode (utf8)

Apache

Database client version: libmysql - 5.6.15

PHP extension: mysqli

Thank you in advance

Upvotes: 0

Views: 783

Answers (2)

Mike.sof
Mike.sof

Reputation: 3

Solved, thanks to @MatBailie , but I had to modified his answer. The final query, including the update, is

UPDATE `table` 
SET value = REGEXP_REPLACE(value, '(.*)<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)', '\\1<a href="\\4"><img class="\\3" src="\\4"\\5'

)

A wildcard (.*) had to be put at the beginning of the search because the link is included in an article (long text) and, consequently, the arguments of the replace pattern are increased.

Upvotes: 0

MatBailie
MatBailie

Reputation: 86775

SELECT
  regexp_replace(
    value,
    '^<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)$',
    '<a href="\\3"><img class="\\2" src="\\3"\\4'
  )
FROM
  yourTable

The replacement only happens if the pattern is matched.

  • ^ at the start means start of the string
  • ([^"]+) means one of more characters, excluding "
  • (.*) means zero or more of any character
  • $ at the end means end of the string

The replacement takes the 3rd "pattern enclosed in braces" (back-reference) and puts it where the 1st "pattern enclosed in braces" (back-reference) was.

The 2nd, 3rd and 4th back-references are replaced with themselves (no change).

https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=96aef2214f844a1466772f41415617e5

If you have strings that don't exactly match the pattern, it will do nothing. Extra spaces will trip it up, for example.

In which case you need to work out a new regular expression that always matches all of the strings you want to work on. Then you can use the \\n back-references to make replacements.

For example, the following deals with extra spaces in the href tag...

SELECT
  regexp_replace(
    value,
    '^<a[ ]+href[ ]*=[ ]*"([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)$',
    '<a href="\\3"><img class="\\2" src="\\3"\\4'
  )
FROM
  yourTable

EDIT:

Following comments clarifying that these are actually snippets from the MIDDLE of the string...

https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=48ce1cc3df5bf4d3d140025b662072a7

UPDATE
  yourTable
SET
  value = REGEXP_REPLACE(
            value,
            '<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"',
            '<a href="\\3"><img class="\\2" src="\\3"'
         )
WHERE
  value REGEXP '<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"'

(Though I prefer the syntax RLIKE, it's functionally identical.)

This will also find an replace that pattern multiple times. You're not clear if that's desired or possible.

Upvotes: 1

Related Questions