Reputation: 7067
I have a String with a "ñ" character and I have some problems with it. I need to encode this String to UTF-8 encoding. I have tried it by this way, but it doesn't work:
byte ptext[] = myString.getBytes();
String value = new String(ptext, "UTF-8");
How do I encode that string to utf-8?
Upvotes: 222
Views: 1165072
Reputation: 414
The correct solution is also:
String myUTF8String = new String(sourceISOString.getBytes(Charsets.ISO_8859_1), Charsets.UTF_8);
Upvotes: 0
Reputation: 308229
String
objects in Java use the UTF-16 encoding that can't be modified*.
The only thing that can have a different encoding is a byte[]
. So if you need UTF-8 data, then you need a byte[]
. If you have a String
that contains unexpected data, then the problem is at some earlier place that incorrectly converted some binary data to a String
(i.e. it was using the wrong encoding).
* As a matter of implementation, String
can internally use a ISO-8859-1 encoded byte[]
when the range of characters fits it, but that is an implementation-specific optimization that isn't visible to users of String
(i.e. you'll never notice unless you dig into the source code or use reflection to dig into a String
object).
Upvotes: 162
Reputation: 648
A quick step-by-step guide how to configure NetBeans default encoding UTF-8. In result NetBeans will create all new files in UTF-8 encoding.
NetBeans default encoding UTF-8 step-by-step guide
Go to etc folder in NetBeans installation directory
Edit netbeans.conf file
Find netbeans_default_options line
Add -J-Dfile.encoding=UTF-8 inside quotation marks inside that line
(example: netbeans_default_options="-J-Dfile.encoding=UTF-8"
)
Restart NetBeans
You set NetBeans default encoding UTF-8.
Your netbeans_default_options may contain additional parameters inside the quotation marks. In such case, add -J-Dfile.encoding=UTF-8 at the end of the string. Separate it with space from other parameters.
Example:
netbeans_default_options="-J-client -J-Xss128m -J-Xms256m -J-XX:PermSize=32m -J-Dapple.laf.useScreenMenuBar=true -J-Dapple.awt.graphics.UseQuartz=true -J-Dsun.java2d.noddraw=true -J-Dsun.java2d.dpiaware=true -J-Dsun.zip.disableMemoryMapping=true -J-Dfile.encoding=UTF-8"
here is link for Further Details
Upvotes: 2
Reputation: 79715
How about using
ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(myString)
Upvotes: 191
Reputation: 652
In a moment I went through this problem and managed to solve it in the following way
first i need to import
import java.nio.charset.Charset;
Then i had to declare a constant to use UTF-8
and ISO-8859-1
private static final Charset UTF_8 = Charset.forName("UTF-8");
private static final Charset ISO = Charset.forName("ISO-8859-1");
Then I could use it in the following way:
String textwithaccent="Thís ís a text with accent";
String textwithletter="Ñandú";
text1 = new String(textwithaccent.getBytes(ISO), UTF_8);
text2 = new String(textwithletter.getBytes(ISO),UTF_8);
Upvotes: 18
Reputation: 9301
In Java7 you can use:
import static java.nio.charset.StandardCharsets.*;
byte[] ptext = myString.getBytes(ISO_8859_1);
String value = new String(ptext, UTF_8);
This has the advantage over getBytes(String)
that it does not declare throws UnsupportedEncodingException
.
If you're using an older Java version you can declare the charset constants yourself:
import java.nio.charset.Charset;
public class StandardCharsets {
public static final Charset ISO_8859_1 = Charset.forName("ISO-8859-1");
public static final Charset UTF_8 = Charset.forName("UTF-8");
//....
}
Upvotes: 95
Reputation: 135
I have use below code to encode the special character by specifying encode format.
String text = "This is an example é";
byte[] byteText = text.getBytes(Charset.forName("UTF-8"));
//To get original string from byte.
String originalString= new String(byteText , "UTF-8");
Upvotes: 3
Reputation: 329
String value = new String(myString.getBytes("UTF-8"));
and, if you want to read from text file with "ISO-8859-1" encoded:
String line;
String f = "C:\\MyPath\\MyFile.txt";
try {
BufferedReader br = Files.newBufferedReader(Paths.get(f), Charset.forName("ISO-8859-1"));
while ((line = br.readLine()) != null) {
System.out.println(new String(line.getBytes("UTF-8")));
}
} catch (IOException ex) {
//...
}
Upvotes: 9
Reputation: 137
This solved my problem
String inputText = "some text with escaped chars"
InputStream is = new ByteArrayInputStream(inputText.getBytes("UTF-8"));
Upvotes: 0
Reputation: 311
You can try this way.
byte ptext[] = myString.getBytes("ISO-8859-1");
String value = new String(ptext, "UTF-8");
Upvotes: 26
Reputation: 346466
A Java String is internally always encoded in UTF-16 - but you really should think about it like this: an encoding is a way to translate between Strings and bytes.
So if you have an encoding problem, by the time you have String, it's too late to fix. You need to fix the place where you create that String from a file, DB or network connection.
Upvotes: 34
Reputation: 32911
Use byte[] ptext = String.getBytes("UTF-8");
instead of getBytes()
. getBytes()
uses so-called "default encoding", which may not be UTF-8.
Upvotes: 77