Alex
Alex

Reputation: 7067

Encode String to UTF-8

I have a String with a "ñ" character and I have some problems with it. I need to encode this String to UTF-8 encoding. I have tried it by this way, but it doesn't work:

byte ptext[] = myString.getBytes();
String value = new String(ptext, "UTF-8");

How do I encode that string to utf-8?

Upvotes: 222

Views: 1165072

Answers (12)

Bokili Production
Bokili Production

Reputation: 414

The correct solution is also:

String myUTF8String = new String(sourceISOString.getBytes(Charsets.ISO_8859_1), Charsets.UTF_8);

Upvotes: 0

Joachim Sauer
Joachim Sauer

Reputation: 308229

String objects in Java use the UTF-16 encoding that can't be modified*.

The only thing that can have a different encoding is a byte[]. So if you need UTF-8 data, then you need a byte[]. If you have a String that contains unexpected data, then the problem is at some earlier place that incorrectly converted some binary data to a String (i.e. it was using the wrong encoding).

* As a matter of implementation, String can internally use a ISO-8859-1 encoded byte[] when the range of characters fits it, but that is an implementation-specific optimization that isn't visible to users of String (i.e. you'll never notice unless you dig into the source code or use reflection to dig into a String object).

Upvotes: 162

Laeeq Khan Niazi
Laeeq Khan Niazi

Reputation: 648

A quick step-by-step guide how to configure NetBeans default encoding UTF-8. In result NetBeans will create all new files in UTF-8 encoding.

NetBeans default encoding UTF-8 step-by-step guide

  • Go to etc folder in NetBeans installation directory

  • Edit netbeans.conf file

  • Find netbeans_default_options line

  • Add -J-Dfile.encoding=UTF-8 inside quotation marks inside that line

    (example: netbeans_default_options="-J-Dfile.encoding=UTF-8")

  • Restart NetBeans

You set NetBeans default encoding UTF-8.

Your netbeans_default_options may contain additional parameters inside the quotation marks. In such case, add -J-Dfile.encoding=UTF-8 at the end of the string. Separate it with space from other parameters.

Example:

netbeans_default_options="-J-client -J-Xss128m -J-Xms256m -J-XX:PermSize=32m -J-Dapple.laf.useScreenMenuBar=true -J-Dapple.awt.graphics.UseQuartz=true -J-Dsun.java2d.noddraw=true -J-Dsun.java2d.dpiaware=true -J-Dsun.zip.disableMemoryMapping=true -J-Dfile.encoding=UTF-8"

here is link for Further Details

Upvotes: 2

Amir Rachum
Amir Rachum

Reputation: 79715

How about using

ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(myString)

Upvotes: 191

Quimbo
Quimbo

Reputation: 652

In a moment I went through this problem and managed to solve it in the following way

first i need to import

import java.nio.charset.Charset;

Then i had to declare a constant to use UTF-8 and ISO-8859-1

private static final Charset UTF_8 = Charset.forName("UTF-8");
private static final Charset ISO = Charset.forName("ISO-8859-1");

Then I could use it in the following way:

String textwithaccent="Thís ís a text with accent";
String textwithletter="Ñandú";

text1 = new String(textwithaccent.getBytes(ISO), UTF_8);
text2 = new String(textwithletter.getBytes(ISO),UTF_8);

Upvotes: 18

rzymek
rzymek

Reputation: 9301

In Java7 you can use:

import static java.nio.charset.StandardCharsets.*;

byte[] ptext = myString.getBytes(ISO_8859_1); 
String value = new String(ptext, UTF_8); 

This has the advantage over getBytes(String) that it does not declare throws UnsupportedEncodingException.

If you're using an older Java version you can declare the charset constants yourself:

import java.nio.charset.Charset;

public class StandardCharsets {
    public static final Charset ISO_8859_1 = Charset.forName("ISO-8859-1");
    public static final Charset UTF_8 = Charset.forName("UTF-8");
    //....
}

Upvotes: 95

laxman954
laxman954

Reputation: 135

I have use below code to encode the special character by specifying encode format.

String text = "This is an example é";
byte[] byteText = text.getBytes(Charset.forName("UTF-8"));
//To get original string from byte.
String originalString= new String(byteText , "UTF-8");

Upvotes: 3

fedesanp
fedesanp

Reputation: 329

String value = new String(myString.getBytes("UTF-8"));

and, if you want to read from text file with "ISO-8859-1" encoded:

String line;
String f = "C:\\MyPath\\MyFile.txt";
try {
    BufferedReader br = Files.newBufferedReader(Paths.get(f), Charset.forName("ISO-8859-1"));
    while ((line = br.readLine()) != null) {
        System.out.println(new String(line.getBytes("UTF-8")));
    }
} catch (IOException ex) {
    //...
}

Upvotes: 9

Prasanth RJ
Prasanth RJ

Reputation: 137

This solved my problem

    String inputText = "some text with escaped chars"
    InputStream is = new ByteArrayInputStream(inputText.getBytes("UTF-8"));

Upvotes: 0

user716840
user716840

Reputation: 311

You can try this way.

byte ptext[] = myString.getBytes("ISO-8859-1"); 
String value = new String(ptext, "UTF-8"); 

Upvotes: 26

Michael Borgwardt
Michael Borgwardt

Reputation: 346466

A Java String is internally always encoded in UTF-16 - but you really should think about it like this: an encoding is a way to translate between Strings and bytes.

So if you have an encoding problem, by the time you have String, it's too late to fix. You need to fix the place where you create that String from a file, DB or network connection.

Upvotes: 34

Peter Štibraný
Peter Štibraný

Reputation: 32911

Use byte[] ptext = String.getBytes("UTF-8"); instead of getBytes(). getBytes() uses so-called "default encoding", which may not be UTF-8.

Upvotes: 77

Related Questions