Dejell
Dejell

Reputation: 14317

change all xml entities to html

I am reading a document which may contain XML entities like &#160.

Since I need to export txt file, I manually have to convert the entities from XML to text.

As you can see below.

reader = new BufferedReader(new InputStreamReader(is, "utf-8"));
while ((s = reader.readLine()) != null) {
 if (s.equals("&#160"))
   s= " ";
}

Since there are many xml entities, and I want to convert them all to text like &#160->space, and prefer to avoid if then, is there a generic way to do it?

Upvotes: 0

Views: 674

Answers (2)

Pace
Pace

Reputation: 43867

I believe what you're talking about is called HTML (not XML) decoding. There is a URLDecoder class which does this for URLs (which may be what you're decoding). There is also a more general class in Apache commons for HTML decoding (specified in this question).

Edit: I was unaware of the difference between HTML and XML escapes/entities, thanks for the clarification. It appears from this question that Apache commons has a library for decoding XML entities but the standard Java library does not.

Upvotes: 1

padis
padis

Reputation: 2354

When you extract the number from  , you can do this:

(new String(new byte[]{(byte)160}, "ISO-8859-1")).

Here are the entity mappings: HTML ISO-8859-1 Reference

Upvotes: 2

Related Questions