Timo77
Timo77

Reputation: 145

Remove specific link from html but leave anchor text using reqular expressions

I have tryed to remove spcific links from html string by using reqular expressions.

I have a html string like this:

<a href="http://linkA.com/fdfdfdf">use this</a> to make this <a href="http://linkB.com/fdsfds">happen</a>

At the end I want it to look like this:

<a href="http://linkA.com/fdfdfdf">use this</a> to make this happen

I have tryed many patterns. At first I removed all href by this:

</?a(|\s+[^>]+)>

Then I have tryed many regexes:

<a\s+(?:[^>]*?\s+)?href="linkB.com([^"]*)
/<a[^>]*href="http\:\/\/linkB.com([^"]*)"[^>]*>.*<\/a>/
<a href="[^"]*?linkB*?">.*?</a>

<a\s.*?href=["']([^"']*?linkB[^"']*?)[^>]*>.*?<\/a>

(?=.*href=\"([^\"]*linkB[^"]*)")<a [^>]+>
<a[^>]*puustelli[^>]*>[^<]*<\/a>

None of them is doing exactly the thing I need to do. Magic needs to happen by finding only domain part of url. I want all links that are pointing to linkB disappear, but leave the anchor text on place.

Upvotes: 0

Views: 818

Answers (2)

Nishant260190
Nishant260190

Reputation: 13

Try this

(<\sa\shref=[^<]+<\sa)href="http:\/\/linkB\.com\/[^>]+(>happen<\/a>)

OR

(.*<\sa\s)href="http:\/\/linkB\.com\/[^>]+(>happen<\/a>)

Upvotes: 0

Francis Gagnon
Francis Gagnon

Reputation: 3675

This regex will find the anchor tag with the href that contains 'linkB.com' and hold the text found between the anchor tags in capture group 1.

<a\s+href\s*=\s*"[^"]*?linkB\.com[^"]*">([^<]+)</a>

Note that this regex is very strict. It doesn't allow for extra attributes in the anchor tag nor does it allow tags to appear between the anchor tags. It can be made more flexible but it will get ugly very quickly. If you need more flexibility than this regex offers I think it would be best to use an HTML parser such as HTML Agility Pack.

Upvotes: 2

Related Questions