Selva
Selva

Reputation: 230

how to write special characters(interpunct) in a xml file in java?

I have a problem in writing a xml file with UTF-8 in JAVA. Problem: I have a file with filename having an interpunct(middot)(·) in it. When im trying to write the filename inside a xml tag, using java code i get some junk number like  in filename instead of ·

OutputStreamWriter osw =new OutputStreamWriter(file_output_stream,"UTF8");

Above is the java code i used to write the xmlfile. Can anybody tell me why to understand and sort the problem ? thanks in advance

Upvotes: 0

Views: 2862

Answers (4)

Ondra Žižka
Ondra Žižka

Reputation: 46876

Java sources are UTF-16 by default. If your character is not in it, then use an escape:

String a = "\u00b7";

Or tell your compiler to use UTF-8 and simply write it to the code as-is.

Upvotes: 2

Bohemian
Bohemian

Reputation: 425198

That character is ASCII 183 (decimal), so you need to escape the character to ·. Here is a demonstration: If I type "·" into this answer, I get "·"
The browser is printing your character because this web page is XML.

There are utility methods that can do this for you, such as apache commons-lang library's StringEscapeUtils.escapeXml() method, which will correctly and safely escape the entire input.

Upvotes: 1

erickson
erickson

Reputation: 269797

Don't try to create XML by hand. Use a library for the purpose. You are just scratching the surface of the heap of special cases that will break a hand-made solution.

One way, using core Java classes, is to create a DOM, then serialize that using an no-op XSL transform that writes to a StreamResult. (if your document is large, you can do something similar by driving a SAX event handler.)

There are many third party libraries that will help you do the same thing very easily.

Upvotes: 0

Joop Eggen
Joop Eggen

Reputation: 109597

In general it is a good idea to use UTF-8 everywhere.

The editor has to know that the source is in UTF-8. You could use the free programmers editor JEdit which can deal with many encodings.

The javac compiler has to know that the java source is in UTF-8. In Java you can use the solution of @OndraŽižka.

This makes for two settings in your IDE.

Upvotes: 0

Related Questions