Assembly x86 MASM - Problem of extracting 5 bits from memory

Question

I have a problem to solve and I have no idea on how to go about it. I'm asking for a general idea on how to go about this. I have a memory address, in ESI. Memory represents some kind of simplified ASCII encoding, where 5 bits, one after another mean a specific character. The memory ends with five bits - 00000b. In order to convert to normal 8-bit ASCII 60H has to be added to each 5-bit value. I would want to store each 5-bit encoded character as 8-bit ASCII encoded under address EDI. EDI also has to end with 0 - 00000000b.

Example: [ESI] = 00011 00010 11010 00000b [EDI] = 01100011 01100010 01111010 00000000b

How i would go about extracting each 5 bits, one after another from esi?

Erik Eidt · Accepted Answer

First and foremost, you need to determine how the 5-bit codes are arranged within the input byte stream.

We need to start with whole input data bytes: if we say that [ESI] = 00011 00010 11010 00000, we are skipping over an important first step in the interpretation and decoding of the 8-bit bytes.

Just like with endian-ness, there are two sensible ways this can be done. One of them is more appropriate for the intel instruction set.

 76543210   bit numbering position
+--------+
|        |      first input data byte
+--------+
    .....    (A) possible position of 5-bit code within first input byte
 .....       (B) alternate position of 5-bit code within first input byte

Choice (A) says that the first 5-bit code is located in least significant bits of the first input data byte, while choice (B) says that the first 5-bit code is located in the most significant bits of the first input data byte.

As the intel machine is little endian and numbers bits from LSB to MSB, choice (A) is more natural on that processor.

Choice (A) would view the input byte stream as follows:

    1       2        3         input byte stream byte number
76543210 76543210 76543210     input byte bit position numbers
hgfedcba ponmlkji xwvutsrq     input byte stream, individual bits as letters
22211111 43333322 55554444     5-bit code number for each bit
   /   /   \  \
  /   /     |  |
 /   /      |  |
/   /       |  |
edcba jihgf onmlk tsrqp yxwvu  5-bit code output stream
  1     2     3     4     5    5-bit code output number

You can see that bits for 5-bit code #2 cross a byte boundary, as does for 5-bit code #4.

With Choice (B) the view would look like this:

    1       2        3         input byte stream byte number
76543210 76543210 76543210     input byte bit position numbers
abcdefgh ijklmnop qrstuvwx     input byte stream, individual bits as letters
11111222 22333334 44445555     5-bit code number for each bit
|   | \   |\   \    \
|   | |   | \   \    |
|   | |   | |   | \   |
abcde fghij klmno pqrst uvwxy  5-bit code output stream
  1     2     3     4     5    5-bit code output number

This is more like what you're showing in your text, so you really need to figure out which choice to use. A concrete example will show where within the first data byte that the first 5-bit code is located.

Either way, one approach is to take in 8-bit bytes and put out 5-bit codes!

The way to deal with the size mismatch is to use a bit queue (variable length) as a buffer, and conditionally take a whole input byte into the bit queue whenever the bit count in the bit queue is less than 5.

(This is a buffering approach; buffering is used everywhere, such as to satisfy size mismatches when an application wants to stream in or out characters/bytes (think printf, and lines of text) yet the device, like file system, wants to write blocks (chunks of 4096).)

As such, the size of the bit_queue is 4 (the largest value not enough to make a 5-bit code) + 8 (a whole new byte) = 12 bits in the queue at maximum — so, the bit_queue will fit in a 16-bit register, and also, we will need another variable as bit count (max value 12).

Basically, we can have a loop that, for each iteration, always outputs one 5-bit code, and conditionally consumes one 8-bit input data byte (e.g. when needed). This approach will keep the queue supplied with enough bits to output a 5-bit code every iteration.

More formally:

    int bit_count = 0;
    uint_16 bit_queue = 0;

loop:
    if bit_count < 5 then          // need to add more bits? so get a byte
        temp8 = *sourcePointer++   // fetch byte/8-bits from memory
        temp16 = temp8             // zero extend temp8 to 16 bits
                                   //  to match the size of bit_queue
        bit_queue = merge(bit_queue, temp16)
        bit_count += 8             // we just added 8 bits to the bit queue
    endif
    temp5 = extract5(bit_queue)    // take the 5 bits of interest
    bit_queue = remove5(bit_queue) // discard 5 bit code from the bit queue
    bit_count -= 5                 // we just removed 5 bits from bit queue
    *destinationPointer++ = temp5 + 0x60
    if temp5 != 0 goto loop

I have written merge, extract5, and remove5 as functions in that pseudo code but they are simple Shift, Or, and Mask logical operations to be done inline (rather than calling functions).

The reason for showing them more abstractly as functions is that their exact definition depends on choice (A) vs. (B).

For choice (A), the new 8 bits from each next input data byte are positioned at the high (more significant) side of the queue, and 5-bit codes are removed from the LSB portion of the bit queue, whereas for choice (B) the opposite — new 8 bits from each next input data byte are positioned at the low/LSB side of the queue and 5-bit codes are removed from the MSB portion of the bit queue.

Example using choice (A):

Initially, the bit_queue is empty and bit_count is zero. Thus, the conditional operation to enter an input data byte into the bit_queue simply loads the bit_queue with first data byte (the merge operation merges with the empty queue, so will be a degenerate shift of zero positions, and simply copying the value as is.)

  00000000hgfedcba     bit_queue loaded with first byte

                 8     bit_count

Function extract5. The 5-bit code is simply the mask of the bit_queue with 0x1f, yielding

  00000000hgfedcba     bit_queue loaded with first byte
  0000000000011111     mask, 0x1f     
 ----------------- &   mask operation
  00000000000edcba     5-bit code result

We can add the 0x60 and send this to the output data stream.

Next function remove5:

  bit_queue >>= 5      discard 5 bits by shifting them off the end

  00000000hgfedcba     bit_queue: before shift
  0000000000000hgf     bit_queue: after shift

Note that for removal, we also decrement bit_count by 5, indicating there are only 3 bits left in the bit_queue.

And finally merge, which takes the bit_queue and temp16 as input (this is on the second iteration):

  0000000000000hgf     bit_queue: remaining bits after shift
  00000000ponmlkji     temp16: next input data byte, zero extended to 16 bits

We need to shift temp16 so that the "i" in temp16 lines up with the "0" next to "h" in bit_queue. The shift amount is simply the bit_count: that's how many bits in the bit_queue we need to keep, so by shifting temp16 by that amount, this will insert bit_count zeros after the data from the input byte.

  temp16 <<= bit_count // bit_count is 3

  00000000ponmlkji     temp16: before shift
  00000ponmlkji000     temp16: after shift

And finally an "or" operation to merge the shifted temp16 into bit_queue:

  0000000000000hgf     bit_queue after removing the first 5 bits
  00000ponmlkji000     temp16: next input data byte, shifted
 ----------------- |   or operation
  00000ponmlkjihgf     new bit_queue after merge operation, 
                        holding 3 bits from input data byte 1,
                         and 8 bits from input data byte 2

Note that the bit_count now goes to 3+8=11.

This bit_queue is now ready to provide another 5-bit code, and so on.

One advantage of this approach is that it does not require unaligned load capabilities of the processor, so is appropriate, for example, for MIPS as well as x86. It also loads each input data byte only once.

As wholly alternative approach, we can extract any given 5-bit value from its index/position number and the two adjacent bytes that it occupies. With that, we simply extract each and every 5-bit value.

Extracting a 5-bit value is done first by computing the bit position offset of the first bit of the 5-bit value — which can be used to compute the byte offset of the (at most) two bytes that hold the 5-bit field, as well as the shift amount to extract it.

The bit position offset is computed by multiplying the 5-bit index by 5 (or if done in a linear loop, adding 5 to a bit position counter each iteration). The byte offset is computed by dividing that bit position offset by 8 (integer division floors and here a divide by 8 is a simple shift right of 3). Finally adding to that the base of the input array, we have the byte address containing the first bit of the 5-bit value at that 5-bit index. Since the 5-bit field has at least one bit in the first byte, it is thus either fully contained in that first byte or a combination of that first byte and the next successive byte, so we can load a 16-bit value at that address and have the full 5-bit field available.

A scan of the 5-bit fields then goes something like this:

    bit_offset = 0
loop:
    byte_offset = bit_offset >> 3
    address = sourcePointer + byte_offset   // regular/unscaled addition
    temp16 = *address                       // this is a 16-bit load ***
    temp16 = temp16 >> bit_offset
    temp8 = temp16 & 0x1f
    *destinationPointer++ = temp8 + 0x60
    bit_offset += 5
    if temp8 != 0 goto loop

*** This approach loads the two bytes at the 5-bit index, needed or not, while incrementing the byte position for the next iteration by at most 1 byte. It reloads every byte of the input stream multiple times. It loads the two bytes of potential interest into a 16-bit datum, and then shifts to extract the 5-bit field of interest. Naively it relies on unaligned access though it can easily work for hardware that doesn't support unaligned accesses with a fairly simple two byte load and reassemble sequence.

Assembly x86 MASM - Problem of extracting 5 bits from memory

Answers (1)

Related Questions