Reputation: 7925

Extracting a specific byte based on an index value

I can currently extract the low or high byte out of a 16 bit int type and store it into an 8 bit int type. Look at the following code example:

#include <bitset>
#include <cassert>
#include <iostream>
#include <vector> // using for std::uint8_t, etc.

// valid values for idx[0,1]
void getByteFrom(std::uint16_t val, std::uint8_t idx, std::uint8_t& res) {  
    assert(idx == 0 || idx == 1);
    res = ((val >> (idx << 3)) & 0xff);
}

int main() {    
    std::uint16_t value = 13579;
    std::bitset<16> bits{ value };

    std::cout << "Reference Bits:\n"
              << bits.to_ulong()
              << '\n' << bits << "\n\n";

    std::uint8_t lowByte = 0, highByte = 0;
    getByteFrom(value, 0, lowByte);
    getByteFrom(value, 1, highByte);

    std::bitset<8> lowBits{ lowByte };
    std::bitset<8> highBits{ highByte };

    std::cout << "lowByte = " << lowByte << '\n';
    std::cout << "lowBits (value): " << lowBits.to_ulong() << '\n';
    std::cout << "lowBits (bits):  " << lowBits << "\n\n";

    std::cout << "highByte = " << highByte << '\n';
    std::cout << "highBits (value): " << highBits.to_ulong() << '\n';
    std::cout << "highBits (bits):  " << highBits << "\n\n";

    return EXIT_SUCCESS;
}

And it generates this output which is expected and desired.

Output

Reference Bits:
13579
0011010100001011

lowByte = ♂
lowBits (value): 11
lowBits (bits):  00001011

highByte = 5
highBits (value): 53
highBits (bits):  00110101

Now I would like to do the same thing but for larger types...

// u8 = std::uint8_t, u16 = uint16_t, etc.

// valid idx [0,1,2,3]
getByteFrom( u32 val, u8 idx, u8& res );

// valid idx [0,1,2,3,4,5,6,7]
getByteFrom( u64 val, u8 idx, u8& res );

// Also: getting words from dwords and qwords and getting qwords from words

// valid idx[0,1]
getWordFrom( u32 val, u8 idx, u16& res );

// valid idx [0,1,2,3]
getWordFrom( u64 val, u8 idx, u16& res );

// valid idx[0,1]
getDWordFrom( u64 value, u8 idx, u32& res );

Knowing that I can use binary logic to get a single byte from a word:

res = ((val >> (idx << 3)) & 0xff);

What I would like to know is what would be the completed table of binary logical expressions with bit shifting and bit masking to have as reference so I can finish writing my functions?

NOTE: - This is in regards to the original first two answers below: -

For the first answer by user dhanushka: This could be interesting however the functions above are not stand alone functions these are going to be implemented as a set of inherited class constructors. I'm trying to have a base Register class and from that create a Reg8, Reg16, Reg32, and Reg64 classes. The underlying type of each class is the respective std::uintx_t where x is 8, 16, 32 & 64 respectively. These structs will contain member of that type as its data, as well as a std::bitset<x> where x is the size of bits for that type.

The constructors will set the value of the uint type member based on which constructor was used and its passed in parameters. Some constructors will be default 0 initialize, others will be pass by value or reference (explicit) type. Truncation of bits is okay if constructing from a larger size to produce a smaller size if the desired output is not effected by the truncation. Then the constructors will initialize the bitset<size> member based on the value of the data member.

I'll be using these Register classes as register objects in a virtual machine program. These class are meant to be simple, fast but robust with a lot of features with very little cost. To this I would like to try and template these classes as well so that most of the overhead is done at compile time.

Each of the types would represent byte, word, dword & qword size registers of a virtual cpu. Some of the features that I want to include involve the fact that you can easily and quickly reverse the order of the bits. Let's say we have a u8 type a Reg8 struct. Let's say it was constructed by a value of its underlying type and let's say the value is 222 in decimal. Since this is a std::uint8_t it would look like this under the hood: dec binary hex 222 = 1101 1110 0xde

I can use bitset's to_string() function convert it to a string, use std::reverse to reverse the order of the bits, and use std::stoi to convert it back to an int type and overwrite the original members producing this:

dec    binary     hex
123    0111 1011  0x7b

This way anyone who would be using this virtual machine library can quickly adjust the storage of bits in any manner that they should need. Now in the larger size register classes, take for example, Reg16 where its underlying type is a std::uint16_t and has an accompanying std::bitset<16> member with the same value; through the use of bitfields, one could easily access each individual byte out of the word. I would also like to incorporate a built in function and mode to switch between endians and this can be done on the fly. By default I think I'll be sticking with little endian since that is what my machine is. So needless to say I have been trying various different design patterns for the past 4-5 days trying to get all couplings together. Overall there will be 4 main ways to construct these registers; default construction 0 initialized, constructed from underlying type and initialized by passed in parameter (explicit), constructed from passed in parameter but depends on an index value into a larger base type, and finally these Registers should also be able to be constructed from other types of Registers. I can pass a Reg64 into a Reg8 constructor and construct a Reg8 off of one of the 8 bytes or Reg64. I can also construct a Reg64 from a single Reg8 that can be inserted into any of its 8 bytes, or from multiple Reg8s. Yes there is a lot of complexity into setting up these classes; but its the versatility that I'm after.

In my virtual PC; these registers will be used to emulate real registers, except these are kind of dynamic polymorphic registers that have two way communication, bidirectional I/O. Of course I would need some flags for that later on; and I plan on using a bit streaming process using the overloaded operator<< and operator>> to push these registers eventually into string streams. I'm possibly thinking of a vector - matrix based register network system that is a core part of the virtual cpu.

Later on when I start to layout my op codes - byte codes and mnemonics I don't think I'll be hard coding them. I'm thinking of having a property file that is read in and parsed and the information will be saved into a static hash map.

So when I go to construct the operations of the CPU instead of a conventional stack type system that has all of its op code functionality hard coded; these hash maps will be used to query for the appropriate operation. This could all change in time too; I'm thinking of an event driven priority queue type system. Now in a general concept all of the Registers in the CPU will be 64 bits where the smaller registers are generalized. So if I'm creating say two Reg16 types and doing addition through op or byte codes; the cpu will literally take a single Reg64 and store both Reg16 into different word parts of that 64 bit register. It will then perform the provided addition of the two (in place so to speak) and store the results in one of the remaining word spaces that is left in that register. It will then shift the bits so that you would get the correct value. If you viewed the results from the Reg64 data member it may or may not represent the exact value of the addition as it would depend on the instruction codes if the resulting word was shifted to give that value or not. You can also easily query or return a Reg16 type of this value as that will be preserved.

Here is a small example but with Reg32 as the base type for simplicity. This may not be exact, but is show only to illustrate the concept.

CPU fetches op codes and gets a set of instructions for 2 Reg8s and to be added and stored into a Reg32. 

// 0x01 - load immediate into reg A
// 0x02 - load immediate into reg B
// 0x10 - add
// 0x0c - store res into 4th byte of Reg32.
// 0xe0 - shift bits in Reg32 to reflect correct value of addition

0000 0001 - load immediate (33) into first byte of Reg32
0000 0010 - load immediate (36) into 2nd byte of Reg32
0001 0000 - add reg & and reg b
0000 1100 - store result of addition into 4th byte of Reg32
1110 0000 - shift bits in Reg32 to reflect actual value of the addition.

// Remember the CPU here is getting a 32bit instruction so all of these 
// byte codes would appear as this in a single 32bit sequence from an 
// instruction register    
// 0x0c100201 this single register contains 4 simultaneous instructions
// This could all possibly be done in one cpu clock cycle, 
// (the relative or conceptual idea, 
// but not necessarily done in practice due to hardware limitations, 
// but can be virtualized) 
// then the next byte code would appear in the 2nd 32 bit register.

// Now imagine this behavior with 64 bit registers. A single 64 bit Register 
// would contain up to 8 byte codes. some byte codes might contain multiple op codes....

If you have made it this far in reading this I know that this is quite long; but I'm just trying to give you as much detailed information as I can covering all of the major aspects of my class designs so that you can have a better understanding of what it is I'm trying to do and achieve and why I'm choosing to try and do something a certain way.

I do appreciate the time you took into giving me an answer with some detailed explanation. I'll have to take some time and work on both proposals and test some values to see if it helps me in getting the correct behavior I'm looking for upon construction of my classes.

Upvotes: 0

Answers (3)

Francis Cugler

Reputation: 7925

After sitting there and doing some of the math by hand bit by bit to recognize the patterns I was able to truly simplify my code with the help of a few function templates. Here is what I have so far and the values appear to be matching what I'm expecting.

Edit:

I have added some typedefs into common.h and moved my function templates to it to simplify code readability. I removed magic numbers and replaced them with constants and I may have adjusted some of the conditional checks too. I even wrapped my code in a namespace. I'll also be including my intended Register classes as they are almost completed, but won't be using them in this main.cpp-*

Edit

I found a few more locations where I was able to substitute my typedefs. More importantly I found a bug within my Register classes when I was unit testing them. The bug pertains to the order in which the type value and the bitset<T> was being declared. Orginally I had bitset<T> declared first so that was the first thing to get initialized. I had to switch the order of the declarations and now everything appears to be good so far. All of the basic constructors are completed. Now it's a matter of writing the constructors that would created a Register type from multiple smaller Register types: example... Reg32( Reg8, Reg8, Reg16 ); The last set of Constructors would take either a smaller uint type or smaller Reg type along with an index value ex: Reg64( Reg32, 0 ); This would assign the bits in Reg32 into the low DWord of Reg64 and Reg32( Reg8 3, Reg8 0 ); This would assign the bit sequence of the first Reg8 into the high Byte and the second into the low Byte of the Reg32, all of the bits in the middle will remain unchanged from their previous value.*

-Updated Code-

main.cpp

#include "common.h"
//#include "Register.h" // if you include this you don't need to include common.h

int main() {
    using namespace nesx;

    std::uint16_t v16 = 23990;
    std::cout << "Byte Testing v16 = 23990\n";
    testBytes(v16);

    std::uint32_t v32 = 1801285115;
    std::cout << "Byte Testing v32 = 1801285115\n";
    testBytes(v32);
    std::cout << "Word Testing v32 = 1801285115\n";
    testWords(v32);

    std::uint64_t v64 = 7486836904524374950;
    std::cout << "Byte Testing v64 = 7486836904524374950\n";
    testBytes(v64);
    std::cout << "Word Testing v64 = 7486836904524374950\n";
    testWords(v64); 
    std::cout << "DWord Testing v64 = 7486836904524374950\n";
    testDWords(v64);

    return EXIT_SUCCESS;
}

common.h

#pragma once

#include <algorithm>
#include <bitset>
#include <cassert>
#include <cstdint>
#include <iostream>
#include <memory>
#include <map>
#include <string>
#include <sstream>
#include <vector>

namespace nesx {

    typedef std::int8_t i8;
    typedef std::int16_t i16;
    typedef std::int32_t i32;
    typedef std::int64_t i64;

    typedef std::uint8_t u8;
    typedef std::uint16_t u16;
    typedef std::uint32_t u32;
    typedef std::uint64_t u64;

    const u16 BYTE = 0x08, WORD = 0x10, DWORD = 0x20, QWORD = 0x40;

    typedef std::bitset<BYTE>  Byte;
    typedef std::bitset<WORD>  Word;
    typedef std::bitset<DWORD> DWord;
    typedef std::bitset<QWORD> QWord;

    template<typename T>
    void getByteFrom(T val, u8 idx, u8& res) {
        res = ((val >> (idx * 8) & 0xff));
    }

    template<typename T>
    void getWordFrom(T val, u8 idx, u16& res) {
        res = ((val >> (idx * 16) & 0xffff));
    }

    template<typename T>
    void getDWordFrom(T val, u8 idx, u32& res) {
        res = ((val >> (idx * 32) & 0xffffffff));
    }

    // Direct Byte Alignment No Offsets
    template<typename T>
    void testBytes(T& value) {
        const u16 size = sizeof(T);
        const u16 numBits = size * BYTE;

        // Make sure that T is either a word, dword or qword
        if (numBits < WORD) {
            return;
        }
        if (numBits == WORD) {
            Word wordBits{ value };
            std::cout << "Reference Bits:\n"
                      << "value = " << wordBits.to_ullong() << '\n'
                      << "bits  = " << wordBits << "\n\n";
        }
        if (numBits == DWORD) {
            DWord dwordBits{ value };
            std::cout << "Reference Bits:\n"
                      << "value = " << dwordBits.to_ullong() << '\n'
                      << "bits  = " << dwordBits << "\n\n";
        }
        if (numBits == QWORD) {
            QWord qwordBits{ value };
            std::cout << "Reference Bits:\n"
                      << "value = " << qwordBits.to_ullong() << '\n'
                      << "bits  = " << qwordBits << "\n\n";
        }

        std::vector<u8> bytes;
        std::vector<Byte> byteBits;
        bytes.resize(size, 0);
        byteBits.resize(size, 0);

        // Populate Our Vectors with Data
        for (u8 idx = 0; idx < size; idx++) {
            8 byte = 0;
            getByteFrom(value, idx, byte);
            bytes[idx] = byte;Byte bits{ byte };
            byteBits[idx] = bits;
        }

        // Now loop through and print out the information
        // from the vectors
        for (std::size_t i = 0; i < size; i++) {
            std::cout << "byte[" << i << "] = " << +bytes[i] << '\n';
            std::cout << "bitset (value): " << byteBits[i].to_ullong() << '\n';
            std::cout << "bitset  (bits): " << byteBits[i] << "\n\n";
        }
    }

    // Direct Word Alignment No Offsets
    template<typename T>
    void testWords(T& value) {
        const u16 size = sizeof(T);
        const u16 numBits = size * BYTE;

        // Make sure T is either a dword or a qword
        if (numBits < DWORD) {
            return;
        }

        if (numBits == DWORD) {
            DWord dwordBits{ value };
            std::cout << "Reference Bits:\n"
                      << "value = " << dwordBits.to_ullong() << '\n'
                      << "bits  = " << dwordBits << "\n\n";}

        if (numBits == QWORD) {
            QWord qwordBits{ value };
            std::cout << "Reference Bits:\n"
                      << "value = " << qwordBits.to_ullong() << '\n'
                      << "bits  = " << qwordBits << "\n\n";
        }

        const u16 numWords = size / 2;
        std::vector<u16> words;
        std::vector<Word> wordBits;
        words.resize(numWords, 0);
        wordBits.resize(numWords, 0);

        // Populate Our Vectors with Data
        for (u8 idx = 0; idx < numWords; idx++) {
            u16 word = 0;
            getWordFrom(value, idx, word);
            words[idx] = word;
            Word bits{ word };
            wordBits[idx] = bits;
        }

        // Now loop through and print out the information
        // from the vectors
        for (std::size_t i = 0; i < numWords; i++) {
            std::cout << "word[" << i << "] = " << words[i] << '\n';
                      << "bitset (value): " << wordBits[i].to_ullong(
        << '\n';
            std::cout << "bitset  (bits): " << wordBits[i] << "\n\n";
        }
    }

    // Direct DWord Alignment No Offsets
    template<typename T>
    void testDWords(T& value) {
        const u16 size = sizeof(T);
        const u16 numBits = size * BYTE;

        // Make sure T is a qword
        if (numBits < QWORD) {
            return;
        }

        if (numBits == QWORD) {
            QWord qwordBits{ value };
            std::cout << "Reference Bits:\n"
                      << "value = " << qwordBits.to_ullong() << '\n'
                      << "bits  = " << qwordBits << "\n\n";
        }

        const u16 numDWords = size / 4;
        std::vector<u32> dwords;
        std::vector<DWord> dwordBits;
        dwords.resize(numDWords, 0);
        dwordBits.resize(numDWords, 0);

        // Populate Our Vectors with Data
        for (u8 idx = 0; idx < numDWords; idx++) {
            u32 dword = 0;
            getDWordFrom(value, idx, dword);
            dwords[idx] = dword;
            DWord bits{ dword };
            dwordBits[idx] = bits;
        }

        // Now loop through and print out the information from the vectors
       for (std::size_t i = 0; i < numDWords; i++) {
           std::cout << "dword[" << i << "] = " << dwords[i] << '\n';
           std::cout << "bitset (value): " << dwordBits[i].to_ullong() << '\n';
           std::cout << "bitset  (bits): " << dwordBits[i] << "\n\n";
        }
    }
} // namespace nesx

Register.h

#pragma once

#include "common.h"

namespace nesx {

    template<typename T>
    struct Register {
        T data;
        Register() = default;
    };

    struct Reg8 : public Register<u8> {
        u8 value;  // must be declared before std::bitset<T>
        Byte bits; // otherwise you will not get the proper bit sequence

        // Default 0 Initialized Constructor
        Reg8() : value{ 0 }, bits{ value } { this->data = 0; }

        // Constructors by Register Sized Values
        // Constructor of smaller types that takes larger types,
        // has to be casted by a narrowing convention
        explicit Reg8(u8& val)  : value{ val }, bits{ value } {
            this->data = value;
        }
        explicit Reg8(u16& val) : value{ static_cast<u8>(val) }, bits{ value } {
            this->data = value;
        }
        explicit Reg8(u32& val) : value{ static_cast<u8>(val) }, bits{ value } {
            this->data = value;
        }
        explicit Reg8(u64& val) : value{ static_cast<u8>(val) }, bits{ value } {
            this->data = value;
        }

        Reg8(u16 val, u8 idx ) {
            assert( idx == 0 || idx == 1 );
            getByteFrom(val, idx, this->value);
            bits = value;
            this->data = value;
        }

        Reg8(u32 val, u8 idx) {
            assert(idx <= 0 && idx >= 3);
            getByteFrom(val, idx, this->value);
            bits = value;
            this->data = value;
        }

        Reg8(u64 val, u8 idx) {
            assert(idx <= 0 && idx >= 7);
            getByteFrom(val, idx, this->value);
            bits = value;
            this->data = value;
        }

        // Constructors by Register Types
        template<typename T>
        explicit Reg8(Register<T>* reg) {
            this->value = static_cast<u8>( reg->data );
            this->bits = value;
        }

        template<typename T>
        Reg8(Register<T>* reg, u8 idx) {
            // first we need to know what type T is to determine 
            // how many bytes are in T so that we can assert our
            // index properly for each different type
            u16 size = sizeof(T); // in bytes

            if (size == BYTE)  { /* TODO: */ }
            if (size == WORD)  { /* TODO: */ }
            if (size == DWORD) { /* TODO: */ }
            if (size == QWORD) { /* TODO: */ }
        }
    };

    struct Reg16 : public Register<u16> {
        u16 value;  // Must be declared before std::bitset<t>
        Word bits;  // otherwise you will not get the proper bit sequence

        // Default 0 Initialized Constructor
        Reg16() : value{ 0 }, bits{ value } { this->data = 0; }

        // Constructors by Register Sized Values
        // Constructor of smaller types that takes larger types,
        // has to be casted by a narrowing convention
        explicit Reg16(u16& val) : value{ val }, bits{ value } {
            this->data = value;
        }

        explicit Reg16( u8& val) : value{ val }, bits{ value } {
            this->data = value;
        }

        explicit Reg16(u32& val) : value{ static_cast<u16>(val) }, bits{ value } {
            this->data = value;
        }

        explicit Reg16(u64& val) : value{ static_cast<u16>(val) }, bits{ value } {
            this->data = value;
        }

        // TODO:
        // low is right side, high is left side of the bitset...
        // Reg16( u8& byte0, u8& byte1 ) { ... } // byte0 = low && byte1 = high

        Reg16( u32 val, u8  idx) {
            assert(idx == 0 || idx == 1);
            getWordFrom(val, idx, this->value);
            bits = value;
            this->data = value;
        }

        Reg16(u64 val, u8 idx) {
            assert(idx <= 0 || idx <= 3);
            getWordFrom(val, idx, this->value);
            bits = value;
            this->data = value;
        }

        // Constructors by Register Types
        template<typename T>
        explicit Reg16(Register<T>* reg) {
            this->value = static_cast<u16>(reg->data);
            this->bits = value;
        }
    };

    struct Reg32 : public Register<u32> {
        u32 value;  // must be declared before std::bitset<T>
        DWord bits; // otherwise you will not get the proper bit sequence

        // Default 0 Initialized Constructor
        Reg32() : value{ 0 }, bits{ value } { this->data = 0; }

        // Constructors by Register Sized Values
        // Constructor of smaller types that takes larger types,
        // has to be casted by a narrowing convention
        explicit Reg32(u32& val) : value{ val }, bits{ value } {
            this->data = value;
        }

        explicit Reg32( u8& val) : value{val}, bits{value} {
            this->data = value;
        }

        explicit Reg32(u16& val) : value{val}, bits{value} {
            this->data = value;
        }

        explicit Reg32(u64& val) : value{ static_cast<u32>(val) }, bits{ value } {
            this->data = value;
        }

        // TODO: 
        // low is right side, high is left side of bitset
        // Reg32( u8 byte0, u8 byte1, u8 byte2, u8 byte3 ) { ... } // byte0 = low ... byte3 = high
        // Reg32( u16 word0, word1 ) { ... } // word0 = low  word1 = high

        Reg32(u64 val, u8 idx) {
            assert(idx == 0 || idx == 1);
            getDWordFrom(val, idx, this->value);
            bits = value;
            this->data = value;
        }

        // Constructors by Register Types
        template<typename T>
        explicit Reg32(Register<T>* reg) {
            this->value = static_cast<u32>(reg->data);
            this->bits = value;
        }
    };

    struct Reg64 : public Register<u64> {
        u64 value;  // Must be declared before std::bitset<T>
        QWord bits; // Otherwise you will not get the proper bit sequence

        // Default 0 Initialized Constructor
        Reg64() : value{ 0 }, bits{ value } { this->data = 0; }

        // Constructors by Register Sized Values
        // Constructor of smaller types that takes larger types,
        // has to be casted by a narrowing convention
        explicit Reg64(u64& val) : value{ val }, bits{ value }{
            this->data = value;
        }

        explicit Reg64( u8& val) : value{ static_cast<u64>(val) }, bits{ value } {
            this->data = value;
        }

        explicit Reg64(u16& val) : value{ static_cast<u64>(val) }, bits{ value } {
             this->data = value;
        }

        explicit Reg64(u32& val) : value{ static_cast<u64>(val) }, bits{ value } {
             this->data = value;
        }

        // TODO:
        // low is right side, high is left side of bitset
        // Reg64( u8 b0, u8 b1, u8 b2, u8 b3, u8 b4, u8 b5, u8 b6, u8 b7 ) {...} b0 = low ... b7 = high
        // Reg64( u16 w0, u16 w1, u16 w2, u16, w3 );
        // Reg64( u32 dw0, u32 dw1 );

        // Constructors by Register Types
        template<typename T>
        explicit Reg64(Register<T>* reg) {
             this->value = static_cast<u64>(reg->data);
             this->bits = value;
        }
    };
};

The only difference here is that I'm not asserting in these template functions, but when I port this code into my classes' or structures' constructors I will then assert the appropriate values there.

Here is the Output:

Byte Testing v16 = 23990
Reference Bits:
value = 23990
bits  = 0101110110110110

byte[0] = ╢     // with promoted uchar 182
bitset (value): 182
bitset  (bits): 10110110

byte[1] = ]     // with promoted uchar 93
bitset (value): 93
bitset  (bits): 01011101

Byte Testing v32 = 1801285115
Reference Bits:
value = 1801285115
bits  = 01101011010111010110110111111011

byte[0] = √     // with promoted uchar 251
bitset (value): 251
bitset  (bits): 11111011

byte[1] = m     // with promoted uchar 109
bitset (value): 109
bitset  (bits): 01101101

byte[2] = ]     // with promoted uchar 93
bitset (value): 93
bitset  (bits): 01011101

byte[3] = k     // with promoted uchar 107
bitset (value): 107
bitset  (bits): 01101011

Word Testing v32 = 1801285115
Reference Bits:
value = 1801285115
bits  = 01101011010111010110110111111011

word[0] = 28155
bitset (value): 28155
bitset  (bits): 0110110111111011

word[1] = 27485
bitset (value): 27485
bitset  (bits): 0110101101011101

Byte Testing v64 = 7486836904524374950
Reference Bits:
value = 7486836904524374950
bits  = 0110011111100110100101100111111101101001011101011110001110100110

byte[0] = ª     // with promoted uchar 166
bitset (value): 166
bitset  (bits): 10100110

byte[1] = π     // with promoted uchar 227
bitset (value): 227
bitset  (bits): 11100011

byte[2] = u     // with promoted uchar 117
bitset (value): 117
bitset  (bits): 01110101

byte[3] = I     // with promoted uchar 105
bitset (value): 105
bitset  (bits): 01101001

byte[4] = ⌂     // with promoted uchar 127
bitset (value): 127
bitset  (bits): 01111111

byte[5] = û     // with promoted uchar 150
bitset (value): 150
bitset  (bits): 10010110

byte[6] = µ     // with promoted uchar 230
bitset (value): 230
bitset  (bits): 11100110

byte[7] = g     // with promoted uchar 103
bitset (value): 103
bitset  (bits): 01100111

Word Testing v64 = 7486836904524374950
Reference Bits:
value = 7486836904524374950
bits  = 0110011111100110100101100111111101101001011101011110001110100110

word[0] = 58278
bitset (value): 58278
bitset  (bits): 1110001110100110

word[1] = 26997
bitset (value): 26997
bitset  (bits): 0110100101110101

word[2] = 38527
bitset (value): 38527
bitset  (bits): 1001011001111111

word[3] = 26598
bitset (value): 26598
bitset  (bits): 0110011111100110

DWord Testing v64 = 7486836904524374950
Reference Bits:
value = 7486836904524374950
bits  = 0110011111100110100101100111111101101001011101011110001110100110

dword[0] = 1769333670
bitset (value): 1769333670
bitset  (bits): 01101001011101011110001110100110

dword[1] = 1743165055
bitset (value): 1743165055
bitset  (bits): 01100111111001101001011001111111

Let me know what you think!

After replacing my code with its updated version, here is a little bit about my Register class's, You can construct any of the Register Types: Reg8, Reg16, Reg32 & Reg64 from any of the uint types: u8, u16, u32 & u64 by direct value. You can also construct them by pointer or address of another Register type. You can also selectively construct them. What I mean by this is you can have declare a Reg16 as a type variable. You can pass to it a u64 and a value of say 2 as an index value. This kind of constructor will fetch the 3 word from the right and uses this to construct an Reg16 type. This kind of behavior can be done from any of the larger types to a smaller type. Give me some more time and I'll have more functionality included into these Register types. I'd like to here your feedback!

Upvotes: 1

dhanushka

Reputation: 10702

You can do it with templates as well. Here is the code:

#include <cstdint>
#include <cassert>
#include <type_traits>

template <typename T, typename U>
void getValAtIdx(T val, uint8_t idx, U& res) {
    assert(std::is_integral<T>::value && std::is_integral<U>::value);
    assert((sizeof(val) > sizeof(res)) && (sizeof(val)/sizeof(res) > idx));
    res = (val >> ((sizeof(res) << 3)*idx)) & ((T)-1 >> ((sizeof(val)-sizeof(res)) << 3));
}

I did not do thorough testing, but I think the logic is okay.

Following should result in an assertion failure

uint16_t res;
uint64_t val = 0x12345678;
getValAtIdx<uint64_t, uint16_t>(val, 4, res);

whereas

uint16_t res;
uint64_t val = 0x12345678;
getValAtIdx<uint64_t, uint16_t>(val, 1, res);

should give you 0x1234.

Upvotes: 1

Scheff's Cat

Reputation: 20171

I'm in doubt whether I understood the question. But if I did the solution is actually easy.

I start with what OP already has:

// valid values for idx[0,1]
void getByteFrom(std::uint16_t val, std::uint8_t idx, std::uint8_t& res) {  
    assert(idx == 0 || idx == 1);
    res = ((val >> (idx << 3)) & 0xff);
}

My first impression is: Replacing idx * 8 by idx << 3 is an unnecessary code obfuscation. I'm sure that any serious modern compiler can produce the same efficient code for:

// valid values for idx[0,1]
void getByteFrom(std::uint16_t val, std::uint8_t idx, std::uint8_t& res) {  
    assert(idx == 0 || idx == 1);
    res = ((val >> (idx * 8)) & 0xff);
}

This as a start for getWordFrom():

// valid idx[0,1]
void getWordFrom(std::uint32_t val, std::uint8_t idx, std::uint16_t& res);

The necessary restriction of idx to range [0, 1] was already mentioned:

void getWordFrom(std::uint32_t val, std::uint8_t idx, std::uint16_t& res)
{ 
    assert(idx == 0 || idx == 1);

The bit pattern to mask out a 16 bit value would be in binary 0b1111111111111111 (a binary number with 16 1 digits). There is no support of binary number literals in C++ I know about. Instead, hex number literals are preferred as one hex digit reflects always exactly four binary digits because 16¹ = 2⁴. (I assume that makes hex numbers so well liked in the "bit shifters" community.) Hence, the bit pattern to mask out a 16 bit value: 0xffff.

To shift higher 16 bits into lower position, the idx has to be multiplied with 16.

    res = ((val >> (idx * 16)) & 0xffff);
}

That was not that complicated... (IMHO).

Please, note that the right shift (>>) is also done for idx == 0 but a right shift by 0 doesn't change the value.

An alternative implementation could be:

    res = (idx != 0 ? val >> (idx * 16) : val) & 0xffff; // NOT BETTER

which does the right shift only if idx != 0. I'm really in doubt whether this would earn anything. I would always prefer the first form.

(Such micro-optimizations have usually less to no effect for overall performance and are actually not worth to be considered.)

Sample code:

#include <cstdint>
#include <bitset>
#include <cassert>
#include <iomanip>
#include <iostream>

using U8 = std::uint8_t;
using U16 = std::uint16_t;
using U32 = std::uint32_t;

// valid values for idx[0,1]
void getByteFrom(U16 val, U8 idx, U8 &res)
{  
    assert(idx == 0 || idx == 1);
    res = ((val >> (idx * 8)) & 0xff);
}

// valid values for idx[0,1]
void getWordFrom(U32 val, U8 idx, U16 &res)
{  
    assert(idx == 0 || idx == 1);
    res = ((val >> (idx * 16)) & 0xffff);
}

// check this out
int main()
{
  {
    U16 value = 13579;
    std::bitset<16> bits{ value };
    std::cout << "Reference Bits:\n"
              << bits.to_ulong()
              << '\n' << bits << "\n\n";

    U8 lowByte = 0, highByte = 0;
    getByteFrom(value, 0, lowByte);
    getByteFrom(value, 1, highByte);

    std::bitset<8> lowBits{ lowByte };
    std::bitset<8> highBits{ highByte };

    std::cout << "lowByte = " << std::setw(2) << std::setfill('0') << std::hex << (unsigned)lowByte << std::dec << '\n';
    std::cout << "lowBits (value): " << lowBits.to_ulong() << '\n';
    std::cout << "lowBits (bits):  " << lowBits << "\n\n";

    std::cout << "highByte = " << std::setw(2) << std::setfill('0') << std::hex << (unsigned)highByte << std::dec << '\n';
    std::cout << "highBits (value): " << highBits.to_ulong() << '\n';
    std::cout << "highBits (bits):  " << highBits << "\n\n";
  }
  {
    U32 value = 135792468;
    std::bitset<32> bits{ value };
    std::cout << "Reference Bits:\n"
              << bits.to_ulong()
              << '\n' << bits << "\n\n";

    U16 lowWord = 0, highWord = 0;
    getWordFrom(value, 0, lowWord);
    getWordFrom(value, 1, highWord);

    std::bitset<16> lowBits{ lowWord };
    std::bitset<16> highBits{ highWord };

    std::cout << "lowWord = " << std::setw(4) << std::setfill('0') << std::hex << lowWord << std::dec << '\n';
    std::cout << "lowBits (value): " << lowBits.to_ulong() << '\n';
    std::cout << "lowBits (bits):  " << lowBits << "\n\n";

    std::cout << "highWord = " << std::setw(4) << std::setfill('0') << std::hex << highWord << std::dec << '\n';
    std::cout << "highBits (value): " << highBits.to_ulong() << '\n';
    std::cout << "highBits (bits):  " << highBits << "\n\n";
  }
}

Output:

Reference Bits:
13579
0011010100001011

lowByte = 0b
lowBits (value): 11
lowBits (bits):  00001011

highByte = 35
highBits (value): 53
highBits (bits):  00110101

Reference Bits:
135792468
00001000000110000000011101010100

lowWord = 0754
lowBits (value): 1876
lowBits (bits):  0000011101010100

highWord = 0818
highBits (value): 2072
highBits (bits):  0000100000011000

Live Demo on coliru

Upvotes: 2

Extracting a specific byte based on an index value

Answers (3)

Related Questions