user3704313
user3704313

Reputation: 47

Converting hex String to structure

I've got a file containing a large string of hexidecimal. Here's the first few lines:

0000038f
0000111d
0000111d
03030303
//Goes on for a long time

I have a large struct that is intended to hold that data:

typedef struct
{
  unsigned int field1: 5;
  unsigned int field2: 11;
  unsigned int field3: 16;
  //Goes on for a long time
}calibration;

What I want to do is read the above string and store it in the struct. I can assume the input is valid (it's verified before I get it).

I've already got a loop that reads the file and puts the whole item in a string:

std::string line = "";
std::string hexText = "";
while(!std::getline(readFile, line))
{
  hexText += line;
}
//Convert string into calibration
//Convert string into long int
long int hexInt = strtol(hexText.c_str(), NULL, 16);
//Here I get stuck: How to get from long int to calibration...?

Upvotes: 2

Views: 2449

Answers (2)

2785528
2785528

Reputation: 5566

How to get from long int to calibration...?

Cameron's answer is good, and probably what you want.

I offer here another (maybe not so different) approach.


Note1: Your file input needs re-work. I will suggest

a) use getline() to fetch one line at a time into a string

b) convert the one entry to a uint32_t (I would use stringstream instead of atol)

once you learn how to detect and recover from invalid input, you could then work on combining a) and b) into one step

c) then install the uint32_t in your structure, for which my offering below might offer insight.


Note2: I have worked many years with bit fields, and have developed a distaste for them. I have never found them more convenient than the alternatives.

The alternative I prefer is bit masks and field shifting.

So far as we can tell from your problem statement, it appears your problem does not need bit-fields (which Cameron's answer illustrates).


Note3: Not all compilers will pack these bit fields for you.

The last compiler I used require what is called a "pragma".

G++ 4.8 on ubuntu seemed to pack the bytes just fine (i.e. no pragma needed)

The sizeof(calibration) for your original code is 4 ... i.e. packed.

Another issue is that packing can unexpectedly change when you change options or upgrade the compiler or change the compiler.

My team's work-around was to always have an assert against struct size and a few byte offsets in the CTOR.


Note4: I did not illustrate the use of 'union' to align a uint32_t array over your calibration struct.

This may be preferred over the reinterpret cast approach. Check your requirements, team lead, professor.


Anyway, in the spirit of your original effort, consider the following additions to your struct calibration:

  typedef struct
  {
     uint32_t field1 :  5;
     uint32_t field2 : 11;
     uint32_t field3 : 16;
     //Goes on for a long time

     // I made up these next 2 fields for illustration
     uint32_t field4 :  8;
     uint32_t field5 : 24;

     // ... add more fields here

     // something typically done by ctor or used by ctor
     void clear() { field1 = 0; field2 = 0; field3 = 0; field4 = 0; field5 = 0; }

     void show123(const char* lbl=0) {
        if(0 == lbl) lbl = " ";
        std::cout << std::setw(16) << lbl;
        std::cout << "     " << std::setw(5) << std::hex << field3 << std::dec 
                  << "     " << std::setw(5) << std::hex << field2 << std::dec 
                  << "     " << std::setw(5) << std::hex << field1 << std::dec 
                  << "     0x" << std::hex << std::setfill('0') << std::setw(8) 
                  << *(reinterpret_cast<uint32_t*>(this))
                  << "    => "  << std::dec << std::setfill(' ') 
                  << *(reinterpret_cast<uint32_t*>(this))
                  << std::endl;
     } // show
     // I did not create show456() ... 

     // 1st uint32_t: set new val, return previous
     uint32_t set123(uint32_t nxtVal) {
        uint32_t* myVal = reinterpret_cast<uint32_t*>(this);
        uint32_t prevVal = myVal[0];
        myVal[0] = nxtVal;
        return (prevVal);
     }

     // return current value of the combined field1, field2 field3
     uint32_t get123(void) {
        uint32_t* myVal = reinterpret_cast<uint32_t*>(this);
        return  (myVal[0]);
     }

     // 2nd uint32_t: set new val, return previous
     uint32_t set45(uint32_t nxtVal) {
        uint32_t* myVal = reinterpret_cast<uint32_t*>(this);
        uint32_t prevVal = myVal[1];
        myVal[1] = nxtVal;
        return (prevVal);
     }

     // return current value of the combined field4, field5
     uint32_t get45(void) {
        uint32_t* myVal = reinterpret_cast<uint32_t*>(this);
        return  (myVal[1]);
     }


     // guess that next 4 fields fill 32 bits
     uint32_t get6789(void) {
        uint32_t* myVal = reinterpret_cast<uint32_t*>(this);
        return  (myVal[2]);
     }
     // ... tedious expansion

  } calibration;

Here is some test code to illustrate the use:

  uint32_t t125()
  {
     const char* lbl = 
        "\n                    16 bits   11 bits    5 bits      hex         => dec";

     calibration cal;
     cal.clear();
     std::cout << lbl << std::endl;
     cal.show123();

     cal.field1 = 1;
     cal.show123("field1 =     1");
     cal.clear();
     cal.field1 = 31;
     cal.show123("field1 =    31");
     cal.clear();

     cal.field2 = 1;
     cal.show123("field2 =     1");
     cal.clear();
     cal.field2 = (2047 & 0x07ff);
     cal.show123("field2 =  2047");
     cal.clear();

     cal.field3 = 1;
     cal.show123("field3 =     1");
     cal.clear();
     cal.field3 = (65535 & 0x0ffff);
     cal.show123("field3 = 65535");

     cal.set123 (0xABCD6E17);
     cal.show123 ("set123(0x...)");

     cal.set123 (0xffffffff);
     cal.show123 ("set123(0x...)");

     cal.set123 (0x0);
     cal.show123 ("set123(0x...)");

     std::cout << "\n";

     cal.clear();
     std::cout << "get123(): " << cal.get123() << std::endl;
     std::cout << " get45(): " << cal.get45() << std::endl;

     // values from your file:
     cal.set123 (0x0000038f);
     cal.set45  (0x0000111d);

     std::cout << "get123(): " << "0x"  << std::hex << std::setfill('0') 
               << std::setw(8) << cal.get123() << std::endl;
     std::cout << " get45(): " << "0x"  << std::hex << std::setfill('0') 
               << std::setw(8) <<  cal.get45() << std::endl;

     // cal.set6789 (0x03030303);
     // std::cout << "get6789(): " << cal.get6789() << std::endl;

     // ...

     return(0);
  }

And the test code output:

                    16 bits   11 bits    5 bits      hex         => dec
                         0         0         0     0x00000000    => 0
  field1 =     1         0         0         1     0x00000001    => 1
  field1 =    31         0         0        1f     0x0000001f    => 31
  field2 =     1         0         1         0     0x00000020    => 32
  field2 =  2047         0       7ff         0     0x0000ffe0    => 65,504
  field3 =     1         1         0         0     0x00010000    => 65,536
  field3 = 65535      ffff         0         0     0xffff0000    => 4,294,901,760
   set123(0x...)      abcd       370        17     0xabcd6e17    => 2,882,366,999
   set123(0x...)      ffff       7ff        1f     0xffffffff    => 4,294,967,295
   set123(0x...)         0         0         0     0x00000000    => 0

get123(): 0
 get45(): 0
get123(): 0x0000038f
 get45(): 0x0000111d

The goal of this code is to help you see how the bit fields map into the lsbyte through msbyte of the data.

Upvotes: 2

Cameron
Cameron

Reputation: 98816

If you care at all about efficiency, don't read the whole thing into a string and then convert it. Simply read one word at a time, and convert that. Your loop should look something like:

calibration c;
uint32_t* dest = reinterpret_cast<uint32_t*>(&c);
while (true) {
    char hexText[8];
    // TODO: Attempt to read 8 bytes from file and then skip whitespace
    // TODO: Break out of the loop on EOF

    std::uint32_t hexValue = 0;    // TODO: Convert hex to dword

    // Assumes the structure padding & packing matches the dump version's
    // Assumes the structure size is exactly a multiple of 32-bytes (w/ padding)
    static_assert(sizeof(calibration) % 4 == 0);
    assert(dest - &c < sizeof(calibration) && "Too much data");
    *dest++ = hexValue;
}
assert(dest - &c == sizeof(calibration) && "Too little data");

Converting 8 chars of hex to an actual 4-byte int is a good exercise and is well-covered elsewhere, so I've left it out (along with the file reading, which is similarly well-covered).

Note the two assumptions in the loop: the first one cannot be checked either at run-time or compile time, and must be either agreed upon in advance or extra work has to be done to properly serialize the structure (handling structure packing and padding, etc.). The last one can at least be checked at compile time with the static_assert.

Also, care has to be taken to ensure that the endianness of the hex bytes in the file matches the endianness of the architecture executing the program when converting the hex string. This will depend on whether the hex was written in a specific endianness in the first place (in which case you can convert it from the know endianness to the current architecture's endianness quite easily), or whether it's architecture-dependent (in which case you have no choice but to assume the endianness is the same as your current architecture).

Upvotes: 1

Related Questions