user3298846
user3298846

Reputation: 69

Regex to get <a href> from a string in java

Suppose I have

<img class="size-full wp-image-10225" alt="animals" src="abc.jpg"> blah blah blah&nbsp;
<a href="http://en.wikipedia.org/wiki/Elephant">elephant is an animal</a>&nbsp;blah

I want a regex to give me the output :

blah blah blah <a href="http://en.wikipedia.org/wiki/Elephant">elephant is an animal</a> blah

without the &nbsp;. I can do str.replace("&nbsp;","") separately, but how do I get the string starting from blah blah... until blah (which includes link tag).

Upvotes: 0

Views: 614

Answers (1)

Jerry
Jerry

Reputation: 71538

Maybe something like this?

^<[^>]*>\s*|&nbsp;

Java escaped:

^<[^>]*>\\s*|&nbsp;

regex101 demo

^<[^>]*>\\s* will match the first img tag and any following spaces. Then replace the &nbsp;. The replacement string is "".

You might want to use a proper HTML parser though, since it'll be less likely to break.

Upvotes: 2

Related Questions