user1032521
user1032521

Reputation: 11

Does changing a string to unicode change its length?

I have the String "0443" which has a length of 4. If I do encoding with UTF, will it change the length?

I need to pass these character to initialisationVector, but initialisationVector accepts 8 bytes length.

Is there any way I can make "0443" to 8 bytes by using UTF encoding?

public static String decrypt(byte[] b) throws Exception{

    byte[] key = "12345678".getBytes("UTF-16");
    byte[] iv ="0443".getBytes("UTF-16");
    System.out.println("Length of iv" + iv.length + "key length.." + key.length);
    SecretKey secretKey = new SecretKeySpec(key, "RC2");
    System.out.println("Key size" + secretKey.getEncoded().length);
    Cipher cipher = Cipher.getInstance("RC2/CBC/NoPadding");
    IvParameterSpec initialisationVector = new IvParameterSpec(iv);
    cipher.init(Cipher.DECRYPT_MODE, secretKey, initialisationVector);
     byte[] cipherText = cipher.doFinal(b);
        String plainText = new String(cipherText, "UTF-8");
        System.out.println("Decrypted Text :: " + plainText);

    return "";
}

Upvotes: 0

Views: 485

Answers (2)

Paŭlo Ebermann
Paŭlo Ebermann

Reputation: 74800

The answer by Affe seems good, but as your question shows some understanding problems, here some general words:

I have the String "0443" which has a length of 4. If I do encoding with UTF, will it change the length?

There is no encoding "UTF". UTF stands for Unicode (or UCS) transformation format, and is a family of encodings:

  • UTF-8 encodes a string in a varying number of 8-bit units (bytes). An ASCII-string like "0443" will be encoded in 4 bytes, every character outside of ASCII needs more than one byte (up to four).
  • UTF-16 encodes a string in a varying number of 16-bit units (double-bytes). Most common characters are encoded in one such unit, but there are some (in principle more than the single ones, but less commonly used) which need two such units.
  • UTF-32 (or UCS-4) encodes a string in 32-bit units (quadruple-bytes). Every character needs 4 bytes here.

For UTF-32 and UTF-16, the order of the bytes inside each unit is important, and thus there are two common versions (Big Endian and Little Endian). Sometimes a byte order mark will be prepended to an encoded text if the byte order (or maybe the encoding at all) might be unknown to the receiver of the message. (For UTF-8, the order of the bytes is fixed.)

Java does this if using UTF-16 for encoding, so you'll get two more bytes. Use UTF-16BE or UTF16LE instead, which don't add this byte.

About your cryptography:

It is generally a bad idea do use a simple string like "12345678" directly as a cryptographic key. This way you have (assuming only decimal digits) only log_2(10^8) ~ 26.6 bits of entropy, instead of the 128 bits possible for a 128-bit key. Trying all possible keys of this form will be done in seconds.

The use of an initialization vector depends on the mode of operation. You are using CBC-mode, where the initialization vector should be random (not even partly predictable before the plaintext is decided). A fixed initialization vector makes your encryption even weaker.

Either use a random key, or, if you must use a password, use a longer password and hash it with some salt (and a high iteration count), for example with PBKDF2 or bcrypt, to generate the key. (The salt could be sent with the message, or be generated from parameters such as the names of your communication partners, just something which is different for each use.)

If you are generating your key, you can also generate the IV from the same data (but then make sure to use a different salt for each message). Otherwise, generate a random initialization vector and send it with each message. (It does not need to be secret, just random.)

Also, you should combine your encryption with a message authentication code, otherwise you are open to chosen-ciphertext attacks on CBC-mode.

Upvotes: 2

Affe
Affe

Reputation: 47994

You're getting 10 bytes instead of 8 in your array because Java is outputting a byte order mark (Little Endian vs Big Endian) before the text. If you only want the plain 8 bytes you need to find out which format the code that receives the bytes is expecting and then specify it.

byte[] iv ="0443".getBytes("UTF-16BE");

or

byte[] iv ="0443".getBytes("UTF-16LE");

Which will give you only the 8 bytes of the characters with the specified representation.

Upvotes: 2

Related Questions