Reputation: 3595
I have a small application which reads from a Oracle 9i database and sends the data via e-mail, using JavaMail. The database has NLS_CHARACTERSET = "WE8MSWIN1252"
, that's it, CP1252.
If I run the app without any parameter, it works fine and the e-mails are sent correctly. However, I've a requeriment that enforces me to run the app with the -Dfile-encoding=utf8
parameter, which results in the text being sent with corrupted characters.
I've tried to change the encoding of the data read from the database, with:
String textToSend = new String(textRead.getBytes("CP1252"), "UTF-8");
But it doesn't help. I've tried all the possible combinations with CP1252, windows-1252, ISO-8859-1
and UTF-8
, but still had no luck.
Any ideas?
Update to clarify my problem: when I do the following:
Statement stat = connection.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
stat.executeQuery("SELECT blah FROM blahblah ...");
ResultSet rs = stat.getResultSet();
String textRead = rs.getString("whatever");
I get textRead
corrupted, because the database is CP1252 and the application is running in UTF-8. Another approach that I've tried but also failed:
InputStream is = rs.getBinaryStream("whatever");
Writer writer = new StringWriter();
char[] buffer = new char[1024];
Reader reader = new BufferedReader(new InputStreamReader(stream, "UTF-8"));
while ((n = reader.read(buffer)) != -1) {
writer.write(buffer, 0, n);
}
String textRead = writer.toString();
Upvotes: 4
Views: 4292
Reputation: 237
I had the same problem:
Orace database using WE8MSWIN1252 charset, some VARCHAR2 column data/text containing the euro-sign (€) in it. Sending the text using JavaMail gave problems on the euro-sign.
Finally it works. Two important things you should check/do:
Upvotes: 1
Reputation: 67722
Your driver should do the conversion automatically and since cp-1252 is a subset of UTF-8 you shouldn't lose information.
Can you try the following: get the String with ResultSet.getString
, write the string to a file. Open the file with an editor with which you can specify UTF-8 character set (jEdit for example).
The file should contain UTF-8 data.
Upvotes: 2
Reputation: 36807
Can you do the conversion in the database? Instead of:
SELECT blah FROM blahblah
Try
SELECT convert(blah, 'WE8MSWIN1252', 'UTF8') FROM blahblah
Upvotes: 0
Reputation: 14763
Your database data is in windows-1252
. So -- assuming it's being handed back verbatim by the JDBC driver -- when you try to convert it to a Java String
, that's the charset you need to specify:
Statement stat = connection.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
ResultSet rs = stat.executeQuery("SELECT blah FROM blahblah ...");
byte[] rawbytes = rs.getBytes("whatever");
String textRead = new String(rawbytes, "windows-1252");
Is part of the requirement that the data be mailed out as UTF-8? If so, the UTF-8 part needs to occur on the output side, not the input side. When you have String
data in Java, it's stored internally as UTF-16. So when you serialize it out to the MimeMessage, you again need to pick a charset:
mimebodypart.setText(textRead, "UTF-8");
Upvotes: 1
Reputation: 4740
You seem to get lost in charset space -- i understand this... :-)
This line
String textToSend = new String(textRead.getBytes("CP1252"), "UTF-8");
does not make much sense. You have already text, convert it to a "cp1252" encoded byte []. Then you tell the VM to treat the bytes as if they were "UTF-8" (which is a lie...).
In short: if you have a String as in textRead, you don't have to convert it at all. If something goes wrong, either the text is already rotten (look at it in the debugger) or gets rotten in the API later on. Check this and come back with more detail? Where is the text that is wrong and where do you exactly read it from or write it to...
Upvotes: 1