MrPublic
MrPublic

Reputation: 518

First Byte's Bit Off-By-One after Steganography

Currently working on a Steganography project where, given a message in bytes and the number of bits to modify per byte, hide a message in an arbitrary byte array.

In the first decoded byte of the resulting message, the value has it's first (leftmost) bit set to '1' instead of '0'. For example, when using message "Foo".getBytes() and maxBits = 1 the result is "Æoo", not "Foo" (0b01000110 gets changed to 0b11000110). With message "Æoo".getBytes() and maxBits = 1 result is "Æoo", meaning the bit is not getting flipped as far as I can tell.

Only certain values of maxBits for certain message bytes cause this error, for example "Foo" encounters this problem at maxBits equal to 1, 5, and 6, whereas "Test" encounters this problem at maxBits equal to 1, 3, and 5. Only the resulting first character ends up with its first bit set, and this problem only occurs at the specified values of this.maxBits related to the initial data.

Encode and Decode Methods:

public byte[] encodeMessage(byte[] data, byte[] message) {
    byte[] encoded = data;
    boolean[] messageBits = byteArrToBoolArr(message);
    int index = 0;
    for (int x = 0; x < messageBits.length; x++) {
        encoded[index] = messageBits[x] ? setBit(encoded[index], x % this.maxBits) : unsetBit(encoded[index], x % this.maxBits);
        if (x % this.maxBits == 0 && x != 0)
            index++;
    }
    return encoded;
}

public byte[] decodeMessage(byte[] data) {
    boolean[] messageBits = new boolean[data.length * this.maxBits];
    int index = 0;
    for (int x = 0; x < messageBits.length; x++) {
        messageBits[x] = getBit(data[index], x % this.maxBits);
        if (x % this.maxBits == 0 && x != 0)
            index++;
    }
    return boolArrToByteArr(messageBits);
}

Unset, Set, and Get Methods:

public byte unsetBit(byte data, int pos) {
    return (byte) (data & ~((1 << pos)));
}

public byte setBit(byte data, int pos) {
    return (byte) (data | ((1 << pos)));
}

public boolean getBit(byte data, int pos) {
    return ((data >>> pos) & 0x01) == 1;
}

Conversion Methods:

public boolean[] byteArrToBoolArr(byte[] b) {
    boolean bool[] = new boolean[b.length * 8];
    for (int x = 0; x < bool.length; x++) {
        bool[x] = false;
        if ((b[x / 8] & (1 << (7 - (x % 8)))) > 0)
            bool[x] = true;
    }
    return bool;
}

public byte[] boolArrToByteArr(boolean[] bool) {
    byte[] b = new byte[bool.length / 8];
    for (int x = 0; x < b.length; x++) {
        for (int y = 0; y < 8; y++) {
            if (bool[x * 8 + y]) {
                b[x] |= (128 >>> y);
            }
        }
    }
    return b;
}

Sample Code and Output:

    test("Foo", 1);//Æoo
    test("Foo", 2);//Foo
    test("Foo", 3);//Foo
    test("Foo", 4);//Foo
    test("Foo", 5);//Æoo
    test("Foo", 6);//Æoo
    test("Foo", 7);//Foo
    test("Foo", 8);//Foo

    test("Test", 1);//Ôest
    test("Test", 2);//Test
    test("Test", 3);//Ôest
    test("Test", 4);//Test
    test("Test", 5);//Ôest
    test("Test", 6);//Test
    test("Test", 7);//Test
    test("Test", 8);//Test

    private static void test(String s, int x) {
        BinaryModifier bm = null;
        try {
            bm = new BinaryModifier(x);//Takes maxBits as constructor param
        } catch (BinaryException e) {
            e.printStackTrace();
        }
        System.out.println(new String(bm.decodeMessage(bm.encodeMessage(new byte[1024], s.getBytes()))));
        return;
    }

Upvotes: 1

Views: 233

Answers (2)

Reti43
Reti43

Reputation: 9797

Your logic of incrementing index has two flaws, which overwrite the first bit of the first letter. Obviously, the bug is expressed when the overwriting bit is different to the first bit.

if (x % this.maxBits == 0 && x != 0)
    index++;

The first problem has to do with embedding only one bit per byte, i.e. maxBits = 1. After you have embedded the very first bit and reached the above conditional, x is still 0, since it will be incremented at the end of the loop. You should be incrementing index at this point, but x != 0 prevents you from doing so. Therefore, the second bit will also be embedded in the first byte, effectively overwriting the first bit. Since this logic also exists in the decode method, you read the first two bits from the first byte.

More specifically, if you embed a 00 or 11, it will be fine. But a 01 will be read as 11 and a 10 will be read as 00, i.e., whatever value is the second bit. If the first letter has an ascii code less or equal than 63 (00xxxxxx), or greater or equal than 192 (11xxxxxx), it will come out fine. For example:

# -> # : 00100011 (35) -> 00100011 (35)
F -> Æ : 01000110 (70) -> 11000110 (198)

The second problem has to do with the x % this.maxBits == 0 part. Consider the case where we embed 3 bits per byte. After the 3rd bit, when we reach the conditional we still have x = 2, so the modulo operation will return false. After we have embedded a 4th bit, we do have x = 3 and we're allowed to move on to the next byte. However, this extra 4th bit will be written at the 0th position of the first byte, since x % this.maxBits will be 3 % 3. So again, we have a bit overwriting our very first bit. However, after the first cycle the modulo operation will correctly write only 3 bits per byte, so the rest of our message will be unaffected.

Consider the binary for "F", which is 01000110. By embedding N bits per byte, we effectively embed the following groups in the first few bytes.

1 bit  01 0 0 0 1 1 0
2 bits 010 00 11 0x
3 bits 0100 011 0xx
4 bits 01000 110x
5 bits 010001 10xxxx
6 bits 0100011 0xxxxx
7 bits 01000110
8 bits 01000110x

As you can see, for groups of 5 and 6 bits, the last bit of the first group is 1, which will overwrite our initial 0 bit. For all other cases the overwrite doesn't affect anything. Note that for 8 bits, we end up using the first bit of the second letter. If that happened to have an ascii code greater or equal than 128, it would again overwrite the firstmost 0 bit.

To address all problems, use either

for (int x = 0; x < messageBits.length; x++) {
    // code in the between
    if ((x + 1) % this.maxBits == 0)
        index++;
}

or

for (int x = 0; x < messageBits.length; ) {
    // code in the between
    x++;
    if (x % this.maxBits == 0)
        index++;
}

Your code has another potential problem which hasn't been expressed. If your data array has a size of 1024, but you only embed 3 letters, you will affect only the first few bytes, depending on the value of maxBits. However, for the extraction, you define your array to have a size of data.length * this.maxBits. So you end up reading bits from all of the bytes of the data array. This is currently no problem, because your array is populated by 0s, which are converted to empty strings. However, if your array had actual numbers, you'd end up reading a lot of garbage past the point of your embedded data.

There are two general ways of addressing this. You either

  • append a unique sequence of bits at the end of your message (marker), such that when you encounter that sequence you terminate the extraction, e.g. eight 0s, or
  • you add a few bits before embedding your actual data (header), which will tell you how to extract your data, e.g., how many bytes and how many bits per byte to read.

Upvotes: 1

QuantumMechanic
QuantumMechanic

Reputation: 13946

One thing you're probably going to run afoul of is the nature of character encoding.

When you call s.getBytes() you are turning the string to bytes using your JVM's default encoding. Then you modify the bytes and you create a new String from the modified bytes again using the default encoding.

So the question is what is that encoding and precisely how does it work. For example, the encoding may well in some cases only be looking at the lower 7 bits of a byte relating to the character, then your setting of the top bit won't have any effect on the string created from the modified bytes.

If you really want to tell if your code is working right, do your testing by directly examining the byte[] being produced by your encode and decode methods, not by turning the modified bytes into strings and looking at the strings.

Upvotes: 0

Related Questions