Reputation: 4039

long long type representation in memory

I wanted to extract bytes from 8 byte type, something like char func(long long number, size_t offset) so for offset n, I will get the nth byte (0 <= n <= 7). While doing so I realized I have no idea how 8 byte variable is actually represented in memory. I hope you can help me to figure it out. I first wrote a short python script to print numbers made of As (ascii value of 65) in each byte

sumx = 0
for x in range(8):
    sumx += (ord('A')*256**x)
    print('x {} sumx {}'.format(x,sumx))

The output is

x 0 sumx 65
x 1 sumx 16705
x 2 sumx 4276545
x 3 sumx 1094795585
x 4 sumx 280267669825
x 5 sumx 71748523475265
x 6 sumx 18367622009667905
x 7 sumx 4702111234474983745

In my mind each number is a bunch of As followed by 0s. Next I wrote a short c++ code to extract the nth byte

#include <iostream>
#include <array>

char func0(long long number, size_t offset)
{
  offset <<= 3;
  return (number & (0x00000000000000FF << offset)) >> offset;
}

char func1(long long unsigned number, size_t offset)
{
  char* ptr = (char*)&number;
  return ptr[offset];
}

int main()
{
  std::array<long long,8> arr{65,16705,4276545,1094795585,280267669825,71748523475265,18367622009667905,4702111234474983745};
  for (int i = 0; i < arr.size(); i++)
    for (int j = 0; j < sizeof(long long unsigned); j++)
      std::cout << "char " << j << " in number " << i << " (" << arr[i] << ") func0 " << func0(arr[i], j) << " func1 " << func1(arr[i], j) << std::endl;
  return 0;
}

Here is the program output (notice the difference starting the 5th byte)

~ # g++ -std=c++11 prog.cpp -o prog; ./prog
char 0 in number 0 (65) func0 A func1 A
char 1 in number 0 (65) func0  func1
char 2 in number 0 (65) func0  func1
char 3 in number 0 (65) func0  func1
char 4 in number 0 (65) func0  func1
char 5 in number 0 (65) func0  func1
char 6 in number 0 (65) func0  func1
char 7 in number 0 (65) func0  func1
char 0 in number 1 (16705) func0 A func1 A
char 1 in number 1 (16705) func0 A func1 A
char 2 in number 1 (16705) func0  func1
char 3 in number 1 (16705) func0  func1
char 4 in number 1 (16705) func0  func1
char 5 in number 1 (16705) func0  func1
char 6 in number 1 (16705) func0  func1
char 7 in number 1 (16705) func0  func1
char 0 in number 2 (4276545) func0 A func1 A
char 1 in number 2 (4276545) func0 A func1 A
char 2 in number 2 (4276545) func0 A func1 A
char 3 in number 2 (4276545) func0  func1
char 4 in number 2 (4276545) func0  func1
char 5 in number 2 (4276545) func0  func1
char 6 in number 2 (4276545) func0  func1
char 7 in number 2 (4276545) func0  func1
char 0 in number 3 (1094795585) func0 A func1 A
char 1 in number 3 (1094795585) func0 A func1 A
char 2 in number 3 (1094795585) func0 A func1 A
char 3 in number 3 (1094795585) func0 A func1 A
char 4 in number 3 (1094795585) func0  func1
char 5 in number 3 (1094795585) func0  func1
char 6 in number 3 (1094795585) func0  func1
char 7 in number 3 (1094795585) func0  func1
char 0 in number 4 (280267669825) func0 A func1 A
char 1 in number 4 (280267669825) func0 A func1 A
char 2 in number 4 (280267669825) func0 A func1 A
char 3 in number 4 (280267669825) func0 A func1 A
char 4 in number 4 (280267669825) func0  func1 A
char 5 in number 4 (280267669825) func0  func1
char 6 in number 4 (280267669825) func0  func1
char 7 in number 4 (280267669825) func0  func1
char 0 in number 5 (71748523475265) func0 A func1 A
char 1 in number 5 (71748523475265) func0 A func1 A
char 2 in number 5 (71748523475265) func0 A func1 A
char 3 in number 5 (71748523475265) func0 A func1 A
char 4 in number 5 (71748523475265) func0  func1 A
char 5 in number 5 (71748523475265) func0  func1 A
char 6 in number 5 (71748523475265) func0  func1
char 7 in number 5 (71748523475265) func0  func1
char 0 in number 6 (18367622009667905) func0 A func1 A
char 1 in number 6 (18367622009667905) func0 A func1 A
char 2 in number 6 (18367622009667905) func0 A func1 A
char 3 in number 6 (18367622009667905) func0 A func1 A
char 4 in number 6 (18367622009667905) func0  func1 A
char 5 in number 6 (18367622009667905) func0  func1 A
char 6 in number 6 (18367622009667905) func0  func1 A
char 7 in number 6 (18367622009667905) func0  func1
char 0 in number 7 (4702111234474983745) func0 A func1 A
char 1 in number 7 (4702111234474983745) func0 A func1 A
char 2 in number 7 (4702111234474983745) func0 A func1 A
char 3 in number 7 (4702111234474983745) func0 A func1 A
char 4 in number 7 (4702111234474983745) func0  func1 A
char 5 in number 7 (4702111234474983745) func0  func1 A
char 6 in number 7 (4702111234474983745) func0  func1 A
char 7 in number 7 (4702111234474983745) func0 A func1 A

This code has 2 functions, func1 which returns the expected values and func0 which I assumed it should return the same values like func1 but it doesn't and I'm not sure why. Basically I understand the 8 byte types like an array of 8 bytes, func1 clearly shows this is case in some sense. I'm not sure why using bit shifts to get to the nth byte is not working and I'm not sure I completely understand how 8 bytes variables are arranged in memory

Upvotes: 6

Answers (4)

Robert Andrzejuk

Reputation: 5232

The correct way to analyze the underlying memory representation of a variable is to use memcpy and copy to a char array (ref: C aliasing rules and memcpy):

#include <cstring>

char get_char(long long num, size_t offs)
{
    char array[sizeof(long long)];

    memcpy(array, &num, sizeof(long long));

    return array[offs];
}

Then for the following example:

int main()
{
    long long var = 0x7766554433221100;

    for (size_t idx = 0; idx < sizeof(long long); ++idx)
        std::cout << '[' << idx << ']' << '=' << std::hex << static_cast<int>(get_char(var, idx)) << '\n';
}

On little-endien systems we get:

[0]=0
[1]=11
[2]=22
[3]=33
[4]=44
[5]=55
[6]=66
[7]=77

On big-endien systems we get:

[0]=77
[1]=66
[2]=55
[3]=44
[4]=33
[5]=22
[6]=11
[7]=0

(https://en.wikipedia.org/wiki/Endianness)

(https://godbolt.org/z/xrPMVw)

Upvotes: 2

Jonathan Overholt

Reputation: 66

The problem in func0 is that your hex literal, while containing 8 bytes of data, is being interpreted as a long because you haven't specified a precision. Use 0xffULL (0xff unsigned long long) instead of 0x00000000000000ff should get you what you want.

The clue was that it was working perfectly for the first 32 bits and fell down after that. I'm at a loss to explain where that 7th A came out of it, though.

Upvotes: 2

6502

Reputation: 114579

The problem is that in the code

 0x00000000000000FF << offset

the number 0xFF on the left is just an integer (no matter how many zeros you put) that left-shifted gives an integer (actually up to the integers size... shifting more than the size of an integer is not portable code).

Using instead:

 0xFFull << offset

solves the issue (because the suffix ull tells it should be considered an unsigned long long).

Of course, as said in another answer, (number >> (offset * 8)) & 0xFF is simpler and works.

Upvotes: 5

Nicol Bolas

Reputation: 474256

This is an extremely overcomplicated way to do something very simple. You don't need to even consider endian issues, because you don't need to access the memory representation of a long long just to get a byte.

Getting the n-th byte is simply a matter of masking away all other bytes and doing a conversion of that value to an unsigned char. So like this:

unsigned char nth_byte(unsigned long long int value, int n)
{
  //Assert that n is on the range [0, 8)
  value = value >> (8 * n);   //Move the desired byte into the first byte.
  value = value & 0xFF;      //Mask away everything that isn't the first byte.
  return unsigned char(value); //Return the first byte.
}

Upvotes: 8

long long type representation in memory

Answers (4)

Related Questions