user3287034
user3287034

Reputation: 27

Exception while parsing xml java

I am trying to parse content using DocumentBuilder.

<html>
<head>
<meta charset="utf-8" />
<title>Test</title>
</head>
<body>
<img height="" src="google.gif?<>" />
</body>
</html>

I am getting an exception while parsing it that src cannot contain <. I need to parse it as I am applying XSL.

Is there any way to do it. as of now, I am first unescaping it parsing using DocumentBuilder and escaping it again.

I am retrieving the above XML in String format from Database. Now when I am trying to parse it using DocumentBuilder I am getting an exception that src cannot contain <. I tried to escape it using StringEscapeUtils.EscapeHtml but it is escaping the complete String and again DocumentBuilder is not able to parse it. Please let me know how to parse src only from HTML as I am not able to accomplish it.

Upvotes: 0

Views: 995

Answers (1)

Nikolas
Nikolas

Reputation: 44476

These symbols <> are predefined entities used for tags in XML. You have to use the special notation. Read more on Wikipedia.

  • &gt; for >
  • &lt; for <
  • &quot; for "
  • &apos; for '
  • &amp; for &

Your code would be finally:

<img height="" src="google.gif?&lt;&gt;" />

Upvotes: 4

Related Questions