Ruchira Nawarathna
Ruchira Nawarathna

Reputation: 1467

Encode String to UTF-8 and 7-bit encoding

I want to encode a string using both 7-bit and Unicode (UTF-8).

import java.nio.ByteBuffer;  
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;

public class Example{
    public static void main(String[] args) throws Exception{
        String originalMessage = "*ABC";
        sevenBitEncoding(originalMessage);
        unicodeEncoding(originalMessage);
    }

    private static void sevenBitEncoding(String originalMessage) {
        char[] ch=originalMessage.toCharArray();
        byte[] bytes = new String(ch).getBytes();
        StringBuilder encodedMessage = new StringBuilder();
        encodedMessage.append("[");
        for(int i=0; i < bytes.length; i++) {
            encodedMessage.append(bytes[i] + ",");
        }
        encodedMessage.replace(encodedMessage.length()-1, encodedMessage.length(), "]");
        System.out.println("7-bit  :" + encodedMessage.toString());
    }

    private static void unicodeEncoding(String originalMessage) {
        byte[] bytes = originalMessage.getBytes(StandardCharsets.UTF_8);
        // ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(originalMessage);
        StringBuilder encodedMessage = new StringBuilder();
        encodedMessage.append("[");
        for(int i=0; i < bytes.length; i++) {
            encodedMessage.append(bytes[i] + ",");
        }
        encodedMessage.replace(encodedMessage.length()-1, encodedMessage.length(), "]");
        System.out.println("unicode:" + encodedMessage.toString());
    }
}

Output:

7-bit  :[65,66,67]
unicode:[65,66,67]

Expected Output:

Since UTF-8 uses base 16 expected value for UTF-8 is 2A. https://flaviocopes.com/unicode/

7-bit  :[42,65,66,67]
unicode:[2A,41,42,43]

Is there a way to achieve this?

Upvotes: 0

Views: 1316

Answers (1)

Kayaman
Kayaman

Reputation: 73528

You're doing a lot of unnecessary things such as creating new Strings from the char[] of a previous one for no reason, and calling getBytes() without a Charset parameter, which is a no-no. You're also confusing bases, and somehow think that "unicode uses hexadecimal" which just doesn't make sense.

Here's how to show the bytes of a String with given encoding (UTF-8 in example).

// Values are decimal, not hex
System.out.println(Arrays.toString("*ABC".getBytes(StandardCharsets.UTF8)));

The bytes for *ABC are the same in all common encodings, so if you want to see differences you're going to have to find a very exotic encoding, or use characters that are encoded differently (such as accented characters like é, à, ä, ö, å which take 2 bytes in UTF-8).

Upvotes: 1

Related Questions