Tanny
Tanny

Reputation: 3

Iterating through bits of a char array in C++

I am still learning C++ and was hoping I could get some help with this problem. I am trying to iterate over an array of char in C++ and I am having some trouble.

So the way I understand things at present is that an array of char is just X amount of 8 bit values stored next to each other in memory (I could be entirely wrong here) ending with a 00.

So what I would like to do is to iterate over this collection of bits in memory and combine them into smaller or larger segments. An example would be if I had 8 chars and I wanted to turn that string of bits into two 32 bit integers or one 64 bit integer.

There is no real point to this other than me trying to give myself a better understanding of how C++ stores variables so if what I am doing is either not possible or just blatantly stupid feel free to let me know :D

Upvotes: 0

Views: 1122

Answers (1)

Dai
Dai

Reputation: 155726

So the way I understand things at present is that an array of char is just X amount of 8 bit values stored next to each other in memory...

Almost correct, but a char is not guaranteed to be 8-bits (an octet) in C and C++. Remember that C and C++ can target almost any processor and ISA in existence, including rare and exotic machines with their own peculiarities. I recommend reading this QA: Will a `char` always-always-always have 8 bits?

...ending with a 00.

This is an assumption that's not entirely correct, sorry.

While a "string" must have a terminator (as per the C language specification), an array of characters may not necessarily have a a NULL-terminator (the '\0' char at the end). A string that is initialized from a string literal will have a null terminator appended, but you can still construct a string or char-array without one.

So what I would like to do is to iterate over this collection of bits in memory and combine them into smaller or larger segments. An example would be if I had 8 chars and I wanted to turn that string of bits into two 32 bit integers or one 64 bit integer.

If you want to force C++ to interpret a range of memory (that is 8 octet-bytes, or 8 char long) then use reinterpret_cast and telling C++ to look at the value of the data pointed-to by the string's pointer:

const char* stringFromLiteral = "abcdefgh";

uint64_t* pointerToStringLiteralPretentingToBePointerToUInt64 = reinterpret_cast<uint64_t*>( stringFromLiteral );

uint64_t asUnsigned64bitInteger = *pointerToStringLiteralPretentingToBePointerToUInt64;

In this case, here's what the process' read-only memory and stack (probably) looks like, assuming that read-only memory is at 0x0800 and the current function's stack-frame starts at 0x1000, and it's a 32-bit big-endian word machine (so sizeof(char*) == 4) and all values are aligned to 16-bit boundaries:

(Each line is span of 8 bytes of memory, with each line prefixed with the address of each line's first byte. Each hexadecmial number after the line's address represents a single char (octet-byte) value. Each .... represents an octet with an undefined value (in reality, its value would be either whatever value was left behind by the last user, 0x00 (for pre-zeroed memory) or some debugger-generated overflow-detection test pattern).

0x0800    0x61 0x62 0x63 0x64 0x65 0x66 0x67 0x68     # The "abcdefgh" string literal is in read-only memory at 0x8000 through 0x0808, including the 0x00 terminator byte.
0x0808    0x00 .... .... .... .... .... .... ....
0x0810    ....  ... .... .... .... .... .... ....

[ Jump forward about 0x200 bytes ]

0x1000    0x00 0x00 0x80 0x00 .... .... .... ....     # The `stringFromLiteral` variable has a 4-byte sized pointer to the string at 0x0800:
0x1008    .... .... .... .... .... .... .... ....
0x1010    0x61 0x62 0x63 0x64 0x65 0x66 0x67 0x68     # The `asUnsigned64bitInteger` value is a 64-bit value that is the same as 8 bytes copied from 0x0800, but without the terminator
0x1018    ....  ... .... .... .... .... .... ....
0x1020    ....  ... .... .... .... .... .... ....

Upvotes: 1

Related Questions