Reputation: 2235
I've got a table which contains hundreds of guides with screenshots. The screenshots images were surrounded by anchor tags as they were clickable before but now I need to remove the anchor tags. All the anchor tags to be removed have an href=#screenshot
followed by a number as in the example below. My plan is to dump the table using mysqldump and then use sed to find and replace the correct strings.
<p>Choose <a href="/components">components</a> to install and click next.</p>
<div class="screen">
<a href="#screenshot3"><img src="/images/screens/install/step3.jpg" alt="Step 3"></a>
</div>
Should be
<p>Choose <a href="/components">components</a> to install and click next.</p>
<div class="screen">
<img src="/images/screens/install/step3.jpg" alt="Step 3">
</div>
I can match the first tag using <a\shref\=\"#screenshot\d+\"\>
but I also need to match its second closing tag so that both can be removed whilst not removing other anchor tags. Any help would be greatly appreciated!
Upvotes: 0
Views: 495
Reputation: 21863
You can try replacing
<a\shref\=\"#screenshot\d+\"\>(.*)<\/a>
with \1
.
The parenthesis will capture everything that is found between them so you can restore it using \1
, \2
...
Keep in mind though that regexes are not the right weapon to use when trying to modify HTML. Read this (and the comments around it) for an explanation.
Upvotes: 1