Reputation: 499
I have a system, where I got French Text from third party, but I am facing hard time to get it readable.
String frenchReceipt = "RETIR�E"; // The original Text should be "RETIRÉE"
I tried all possible combinations to convert the string using encoding of UTF-8 and ISO-8859-1
String frenchReceipt = "RETIR�E"; // The original Text should be "RETIRÉE"
byte[] b1 = new String(frenchReceipt.getBytes()).getBytes("UTF-8");
System.out.println(new String(b1)); // RETIR�E
byte[] b2 = new String(frenchReceipt.getBytes()).getBytes("ISO-8859-1");
System.out.println(new String(b2)); // RETIR�E
byte[] b3 = new String(frenchReceipt.getBytes(), "UTF-8").getBytes();
System.out.println(new String(b3)); // RETIR?E
byte[] b4 = new String(frenchReceipt.getBytes(), "UTF-8").getBytes();
System.out.println(new String(b4)); //RETIR?E
byte[] b5 = new String(frenchReceipt.getBytes(), "ISO-8859-1").getBytes("UTF-8");
System.out.println(new String(b5)); //RETIR�E
byte[] b6 = new String(frenchReceipt.getBytes(), "UTF-8").getBytes("ISO-8859-1");
System.out.println(new String(b6)); //RETIR?E
byte[] b7 = new String(frenchReceipt.getBytes(), "UTF-8").getBytes("UTF-8");
System.out.println(new String(b7)); //RETIR�E
byte[] b8 = new String(frenchReceipt.getBytes(), "ISO-8859-1").getBytes("ISO-8859-1");
System.out.println(new String(b8)); //RETIR�E
As you see nothing fix the problem.
Please advise.
Update: The third -party partner confirmed that data sent to my application in "ISO-8859-1" Encoding
Upvotes: 1
Views: 1111
Reputation: 1051
� is just a replacement character (EF|BF|BD
UTF-8) and used to indicate problems when a system is unable to render a correct symbol.
It means that you have no chance to convert � into É.
frenchReceipt
doesn't contain any byte sequence which could be converted into É because of the declaration:
String frenchReceipt = "RETIR�E";
Your code snippet below should work pretty fine but you have to use the correct byte source.
byte[] b2 = new String(frenchReceipt.getBytes()).getBytes("ISO-8859-1");
System.out.println(new String(b2));
So if you read "RETIRÉE" by bytes from a data source and get 52|45|54|49|52|C9|45
(ISO-8859-1 is expected) then you'll get the proper result.
If the data source has already the byte sequence EF|BF|BD
the only option you have is search&replace, but in this case, there is no difference between i.e. ä and É.
Update: Since the data are delivered by TCP
new BufferedReader(new InputStreamReader(connection.getInputStream(),"ISO-8859-1"))
solved the issue.
Upvotes: 2