Harish
Harish

Reputation:

UTF Encoding in java

I need to encode a message from request and write it into a file. Currently I am using the URLEncoder.encode() method for encoding. But it is not giving the expected result for special characters in French and Dutch.

I have tried using URLEncoder.encode("msg", "UTF-8") also.

Example:
Original message: Pour gérer votre GSM
After encoding: Pour g?rer votre GSM

Can any one tell me which method I can use for this purpose?

Upvotes: 2

Views: 5828

Answers (7)

Tim Büthe
Tim Büthe

Reputation: 63794

I seems to me like every single web developer in the world stumbles over this. I'd like to point to an article that helped me a lot:

http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

And if you use db2: this IBM developer works Article

By the way, I think the browsers don't support Unicode in addresses, because one could easily set up a phishing page when you use characters from one language that look similar to characters in another language.

Upvotes: 0

Daniel Hiller
Daniel Hiller

Reputation: 3485

Use an explicit encoding when creating the string you want to send:

final String input = ...;
final String utf8 = new String( input.getBytes( "UTF-8" ) , "UTF-8" );

Upvotes: 0

Nir Levy
Nir Levy

Reputation: 4740

if you are using tomcat then please see my post on the subject here http://nirlevy.blogspot.com/2009/02/utf8-and-hebrew-in-tomcat.html

I had the problem with hebrew but it's the same for every non english language

Upvotes: 0

Tom Anderson
Tom Anderson

Reputation: 1027

There are a lot of causes for the problem you have observed. The primary cause is that REQUEST is not giving you UTF-8 in the first place. I imagine that this situation will change over time, but currently there are many weak links that could be to blame: neither mySQL nor PHP5, html nor browsers use UTF-8 by default, though the data may originally be.

See stackoverflow: how-do-i-set-character-encoding-to-utf-8-for-default-html

and java.sun.com: technicalArticles--HTTPCharset

I experienced this problem with Chinese, and for that I'd recommend herongyang.com

Upvotes: 0

A_M
A_M

Reputation: 7851

Try doing something like:

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
                                        new FileOutputStream(file),"UTF-8"));

Upvotes: 1

Erich Kitzmueller
Erich Kitzmueller

Reputation: 36987

URL encoding is not the right thing to do to preserve UTF-8 characters. See

What character set should I assume the encoded characters in a URL to be in?

Upvotes: 2

notnoop
notnoop

Reputation: 59307

Have you tried using specifying OutputStream encoder using the [OutputStreamWriter(OutputStream, Charset)](http://java.sun.com/javase/6/docs/api/java/io/OutputStreamWriter.html#OutputStreamWriter(java.io.OutputStream,%20java.nio.charset.Charset)

Upvotes: 0

Related Questions