Reputation: 1467
I want to encode a string using both 7-bit and Unicode (UTF-8).
import java.nio.ByteBuffer;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
public class Example{
public static void main(String[] args) throws Exception{
String originalMessage = "*ABC";
sevenBitEncoding(originalMessage);
unicodeEncoding(originalMessage);
}
private static void sevenBitEncoding(String originalMessage) {
char[] ch=originalMessage.toCharArray();
byte[] bytes = new String(ch).getBytes();
StringBuilder encodedMessage = new StringBuilder();
encodedMessage.append("[");
for(int i=0; i < bytes.length; i++) {
encodedMessage.append(bytes[i] + ",");
}
encodedMessage.replace(encodedMessage.length()-1, encodedMessage.length(), "]");
System.out.println("7-bit :" + encodedMessage.toString());
}
private static void unicodeEncoding(String originalMessage) {
byte[] bytes = originalMessage.getBytes(StandardCharsets.UTF_8);
// ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(originalMessage);
StringBuilder encodedMessage = new StringBuilder();
encodedMessage.append("[");
for(int i=0; i < bytes.length; i++) {
encodedMessage.append(bytes[i] + ",");
}
encodedMessage.replace(encodedMessage.length()-1, encodedMessage.length(), "]");
System.out.println("unicode:" + encodedMessage.toString());
}
}
Output:
7-bit :[65,66,67]
unicode:[65,66,67]
Expected Output:
Since UTF-8 uses base 16 expected value for UTF-8 is 2A. https://flaviocopes.com/unicode/
7-bit :[42,65,66,67]
unicode:[2A,41,42,43]
Is there a way to achieve this?
Upvotes: 0
Views: 1316
Reputation: 73528
You're doing a lot of unnecessary things such as creating new Strings from the char[]
of a previous one for no reason, and calling getBytes()
without a Charset parameter, which is a no-no. You're also confusing bases, and somehow think that "unicode uses hexadecimal" which just doesn't make sense.
Here's how to show the bytes of a String with given encoding (UTF-8 in example).
// Values are decimal, not hex
System.out.println(Arrays.toString("*ABC".getBytes(StandardCharsets.UTF8)));
The bytes for *ABC
are the same in all common encodings, so if you want to see differences you're going to have to find a very exotic encoding, or use characters that are encoded differently (such as accented characters like é, à, ä, ö, å which take 2 bytes in UTF-8).
Upvotes: 1