Sathesh S
Sathesh S

Reputation: 1323

Removing <a href > tag using regex

I want to extract the plain text from given HTML code. I tried using regex and got

String target = val.replaceAll("<a.*</a>", "");.

My main requirement is I want remove everything between <a> and </a> (including the Link name). While using the above code all other contents also removed.

<a href="www.google.com">Google</a> This is a Google Link

<a href="www.yahoo.com">Yahoo</a> This is a Yahoo Link

Here I want to remove the values between <a> and </a>. Final output should

This is a Google Link This is a Yahoo Link

Upvotes: 10

Views: 19714

Answers (1)

p.s.w.g
p.s.w.g

Reputation: 149000

Use a non-greedy quantifier (*?). For example, to remove the link entirely:

String target = val.replaceAll("<a.*?</a>", "");

Or to replace the link with just the link tag's contents:

String target = val.replaceAll("<a[^>]*>(.*?)</a>", "This is a $1 Link");

However, I would still recommend using a proper DOM manipulation API.

Upvotes: 28

Related Questions