Reputation: 29286
I am converting a String from UTF-8 to CP1047 and then performing hex encoding on it, which works great. Next what I am doing is converting back, using decoding the hex String and displaying it on console in UTF-8 format. Problem is I am not getting the proper String what I passed to encoding method. Below is the piece of code I coded:
public class HexEncodeDecode {
public static void main(String[] args) throws UnsupportedEncodingException,
DecoderException {
String reqMsg = "ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0 000123450041234";
char[] hexed = getHex(reqMsg, "UTF-8", "Cp1047");
System.out.println(hexed);
System.out.println(getString(hexed));
}
public static char[] getHex(String source, String inputCharacterCoding,
String outputCharacterCoding) throws UnsupportedEncodingException {
return Hex.encodeHex(new String(source.getBytes(inputCharacterCoding),
outputCharacterCoding).getBytes(), false);
}
public static String getString(char[] source) throws DecoderException,
UnsupportedEncodingException {
return new String(Hex.decodeHex(source), Charset.forName("UTF-8"));
}
}
Output I am getting is :
C3B1C3AB7CC290C291C295C290C290C290C290C291C295C290C298C290C290C3A41616C290C290C290C290C290C298C290C290C290C290C290C290C290C290C294C290C290C290C290C290C295C290C290C290C290C290C290C29016C291C295C291C29016C291C299C290C290C290C290C290C290C290C290C291C294C290C294C291C296C291C295C291C294C291C298C290C290C290C290C291C2941604C296C299C290C291C296C291C280C290C3A2C290C280C280C280C280C290C290C290C29116C293C294C295C290C290C294C29116C293C294
ñë|äâ
So, need help in printing the input String back.
Expected output would be:
C3B1C3AB7CC290C291C295C290C290C290C290C291C295C290C298C290C290C3A41616C290C290C290C290C290C298C290C290C290C290C290C290C290C290C294C290C290C290C290C290C295C290C290C290C290C290C290C29016C291C295C291C29016C291C299C290C290C290C290C290C290C290C290C291C294C290C294C291C296C291C295C291C294C291C298C290C290C290C290C291C2941604C296C299C290C291C296C291C280C290C3A2C290C280C280C280C280C290C290C290C29116C293C294C295C290C290C294C29116C293C294
ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0 000123450041234
Upvotes: 1
Views: 15319
Reputation: 3759
A quick fix (though a little ugly) would be to change getString()
to:
public static String getString(char[] source) throws DecoderException, UnsupportedEncodingException {
return new String(new String(Hex.decodeHex(source), Charset.forName("UTF-8")).getBytes("Cp1047"),"UTF-8");
}
As fge already mentioned, you switch transforming between chars and bytes, which are different pairs of shoes. So in this quick solution you first get your hex decode assuming UTF-8, then encoding it to a Cp1047 byte array and finally, decode it back to a String by using the UTF-8 charset.
As I already said, this is simply a quick one-liner workaround and not the cleanest solution, as the error is already done during the hex encoding.
Upvotes: 1
Reputation: 27724
reqMsg
no longer has an encoding so it's pointless (and damaging) to try to convert if from UTF-8 to "Cp1047".
If reqMsg
is going to be coming from an external source in the future, such as from disk or network, then you will have to decode - perhaps this is where the confusion comes from. Perhaps you'll being doing: UTF-8->Unicode(String)->CP1047->HEX. When you write it to stdout, the HEX will likely to be ASCII encoded.
The follow example creates an ASCII hex string based on your original string after conversion to CP1047 (Unicode->CP1047->HEX):
String reqMsg = "ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0 000123450041234";
// encode to cp1047 represented as Hex
byte[] reqMsqBytes = reqMsg.getBytes("Cp1047");
char[] hex = Hex.encodeHex(reqMsqBytes);
System.out.println(hex);
// decode
String respMsqBytes = new String(Hex.decodeHex(hex), "Cp1047");
System.out.println(respMsqBytes);
Upvotes: 2
Reputation: 121760
new String(source.getBytes(inputCharacterCoding), outputCharacterCoding)
.getBytes()
This probably does not do what you think it does.
First things first: a String
has no encoding. Repeat after me: a String
has no encoding.
A String
is simply a sequence of tokens which aim to represent characters. It just happens that for this purpose Java uses a sequence of char
s. They could just as well be carrier pigeons.
UTF8, CP1047 and others are just character codings; two operations can be performed:
char
s) into a stream of bytes;char
s).Basically, your base assumption is wrong; you cannot associate an encoding with a String
. Your real input should be a byte
stream (more often than not a byte array) which you know is the result of a particular encoding (in your case, UTF-8), which you want to re-encode using another charset (in your case, CP1047).
The "secret" behing a real answer here would be the code of your Hex.encodeHex()
method but you don't show it, so this is as good an answer that I can muster.
Upvotes: 8