Reputation: 4039
I wanted to extract bytes from 8 byte type, something like char func(long long number, size_t offset)
so for offset n, I will get the n
th byte (0 <= n <= 7). While doing so I realized I have no idea how 8 byte variable is actually represented in memory. I hope you can help me to figure it out.
I first wrote a short python script to print numbers made of A
s (ascii value of 65) in each byte
sumx = 0
for x in range(8):
sumx += (ord('A')*256**x)
print('x {} sumx {}'.format(x,sumx))
The output is
x 0 sumx 65
x 1 sumx 16705
x 2 sumx 4276545
x 3 sumx 1094795585
x 4 sumx 280267669825
x 5 sumx 71748523475265
x 6 sumx 18367622009667905
x 7 sumx 4702111234474983745
In my mind each number is a bunch of A
s followed by 0s. Next I wrote a short c++ code to extract the n
th byte
#include <iostream>
#include <array>
char func0(long long number, size_t offset)
{
offset <<= 3;
return (number & (0x00000000000000FF << offset)) >> offset;
}
char func1(long long unsigned number, size_t offset)
{
char* ptr = (char*)&number;
return ptr[offset];
}
int main()
{
std::array<long long,8> arr{65,16705,4276545,1094795585,280267669825,71748523475265,18367622009667905,4702111234474983745};
for (int i = 0; i < arr.size(); i++)
for (int j = 0; j < sizeof(long long unsigned); j++)
std::cout << "char " << j << " in number " << i << " (" << arr[i] << ") func0 " << func0(arr[i], j) << " func1 " << func1(arr[i], j) << std::endl;
return 0;
}
Here is the program output (notice the difference starting the 5th byte)
~ # g++ -std=c++11 prog.cpp -o prog; ./prog
char 0 in number 0 (65) func0 A func1 A
char 1 in number 0 (65) func0 func1
char 2 in number 0 (65) func0 func1
char 3 in number 0 (65) func0 func1
char 4 in number 0 (65) func0 func1
char 5 in number 0 (65) func0 func1
char 6 in number 0 (65) func0 func1
char 7 in number 0 (65) func0 func1
char 0 in number 1 (16705) func0 A func1 A
char 1 in number 1 (16705) func0 A func1 A
char 2 in number 1 (16705) func0 func1
char 3 in number 1 (16705) func0 func1
char 4 in number 1 (16705) func0 func1
char 5 in number 1 (16705) func0 func1
char 6 in number 1 (16705) func0 func1
char 7 in number 1 (16705) func0 func1
char 0 in number 2 (4276545) func0 A func1 A
char 1 in number 2 (4276545) func0 A func1 A
char 2 in number 2 (4276545) func0 A func1 A
char 3 in number 2 (4276545) func0 func1
char 4 in number 2 (4276545) func0 func1
char 5 in number 2 (4276545) func0 func1
char 6 in number 2 (4276545) func0 func1
char 7 in number 2 (4276545) func0 func1
char 0 in number 3 (1094795585) func0 A func1 A
char 1 in number 3 (1094795585) func0 A func1 A
char 2 in number 3 (1094795585) func0 A func1 A
char 3 in number 3 (1094795585) func0 A func1 A
char 4 in number 3 (1094795585) func0 func1
char 5 in number 3 (1094795585) func0 func1
char 6 in number 3 (1094795585) func0 func1
char 7 in number 3 (1094795585) func0 func1
char 0 in number 4 (280267669825) func0 A func1 A
char 1 in number 4 (280267669825) func0 A func1 A
char 2 in number 4 (280267669825) func0 A func1 A
char 3 in number 4 (280267669825) func0 A func1 A
char 4 in number 4 (280267669825) func0 func1 A
char 5 in number 4 (280267669825) func0 func1
char 6 in number 4 (280267669825) func0 func1
char 7 in number 4 (280267669825) func0 func1
char 0 in number 5 (71748523475265) func0 A func1 A
char 1 in number 5 (71748523475265) func0 A func1 A
char 2 in number 5 (71748523475265) func0 A func1 A
char 3 in number 5 (71748523475265) func0 A func1 A
char 4 in number 5 (71748523475265) func0 func1 A
char 5 in number 5 (71748523475265) func0 func1 A
char 6 in number 5 (71748523475265) func0 func1
char 7 in number 5 (71748523475265) func0 func1
char 0 in number 6 (18367622009667905) func0 A func1 A
char 1 in number 6 (18367622009667905) func0 A func1 A
char 2 in number 6 (18367622009667905) func0 A func1 A
char 3 in number 6 (18367622009667905) func0 A func1 A
char 4 in number 6 (18367622009667905) func0 func1 A
char 5 in number 6 (18367622009667905) func0 func1 A
char 6 in number 6 (18367622009667905) func0 func1 A
char 7 in number 6 (18367622009667905) func0 func1
char 0 in number 7 (4702111234474983745) func0 A func1 A
char 1 in number 7 (4702111234474983745) func0 A func1 A
char 2 in number 7 (4702111234474983745) func0 A func1 A
char 3 in number 7 (4702111234474983745) func0 A func1 A
char 4 in number 7 (4702111234474983745) func0 func1 A
char 5 in number 7 (4702111234474983745) func0 func1 A
char 6 in number 7 (4702111234474983745) func0 func1 A
char 7 in number 7 (4702111234474983745) func0 A func1 A
This code has 2 functions, func1
which returns the expected values and func0
which I assumed it should return the same values like func1
but it doesn't and I'm not sure why. Basically I understand the 8 byte types like an array of 8 bytes, func1
clearly shows this is case in some sense. I'm not sure why using bit shifts to get to the n
th byte is not working and I'm not sure I completely understand how 8 bytes variables are arranged in memory
Upvotes: 6
Views: 994
Reputation: 5232
The correct way to analyze the underlying memory representation of a variable is to use memcpy and copy to a char
array (ref: C aliasing rules and memcpy):
#include <cstring>
char get_char(long long num, size_t offs)
{
char array[sizeof(long long)];
memcpy(array, &num, sizeof(long long));
return array[offs];
}
Then for the following example:
int main()
{
long long var = 0x7766554433221100;
for (size_t idx = 0; idx < sizeof(long long); ++idx)
std::cout << '[' << idx << ']' << '=' << std::hex << static_cast<int>(get_char(var, idx)) << '\n';
}
On little-endien systems we get:
[0]=0
[1]=11
[2]=22
[3]=33
[4]=44
[5]=55
[6]=66
[7]=77
On big-endien systems we get:
[0]=77
[1]=66
[2]=55
[3]=44
[4]=33
[5]=22
[6]=11
[7]=0
(https://en.wikipedia.org/wiki/Endianness)
(https://godbolt.org/z/xrPMVw)
Upvotes: 2
Reputation: 66
The problem in func0 is that your hex literal, while containing 8 bytes of data, is being interpreted as a long because you haven't specified a precision. Use 0xffULL (0xff unsigned long long) instead of 0x00000000000000ff should get you what you want.
The clue was that it was working perfectly for the first 32 bits and fell down after that. I'm at a loss to explain where that 7th A came out of it, though.
Upvotes: 2
Reputation: 114579
The problem is that in the code
0x00000000000000FF << offset
the number 0xFF
on the left is just an integer (no matter how many zeros you put) that left-shifted gives an integer (actually up to the integers size... shifting more than the size of an integer is not portable code).
Using instead:
0xFFull << offset
solves the issue (because the suffix ull
tells it should be considered an unsigned long long
).
Of course, as said in another answer, (number >> (offset * 8)) & 0xFF
is simpler and works.
Upvotes: 5
Reputation: 474256
This is an extremely overcomplicated way to do something very simple. You don't need to even consider endian issues, because you don't need to access the memory representation of a long long
just to get a byte.
Getting the n-th byte is simply a matter of masking away all other bytes and doing a conversion of that value to an unsigned char
. So like this:
unsigned char nth_byte(unsigned long long int value, int n)
{
//Assert that n is on the range [0, 8)
value = value >> (8 * n); //Move the desired byte into the first byte.
value = value & 0xFF; //Mask away everything that isn't the first byte.
return unsigned char(value); //Return the first byte.
}
Upvotes: 8