Reputation: 147

Pointer reference and dereference

I have the following code:

#include <iostream>

char ch[] = "abcd";

int main() {
    std::cout << (long)(int*)(ch+0) << ' '
         << (long)(int*)(ch+1) << ' '
         << (long)(int*)(ch+2) << ' '
         << (long)(int*)(ch+3) << std::endl;

    std::cout << *(int*)(ch+0) << ' '
         << *(int*)(ch+1) << ' '
         << *(int*)(ch+2) << ' '
         << *(int*)(ch+3) << std::endl;
    std::cout << int('abcd') << ' '
         << int('bcd') << ' '
         << int('cd') << ' '
         << int('d') << std::endl;
}

My question is why the pointer of 'd' is 100 ? I think it should be:

int('d') << 24; //plus some trash on stack after ch

And the question is why the second and the third line of the stdout are different ?

6295640 6295641 6295642 6295643

1684234849 6579042 25699 100

1633837924 6447972 25444 100

Thanks.

Upvotes: 2

Answers (3)

4pie0

Reputation: 29754

int('d') is character 'd' converted to int and its decimal value is 100. You can take a look at ASCII table.

Besides this you use pointer arithmetic that is not correct, because every read of ch + x when x > 0 will read past the end of array.

so why the last number of the second row is 100 ? it should be 100 << 24 plus some trash

Possibly you read 100,0,0,0 (though any garbage is possible on the 1st, 2nd, 3rd place) and it is read as 100 because of endiannes. The same as why "3rd entry is : (int)('d'*256 + 'c') = 25699 and not 'c'*256 + 'd'.

And if someone was interested why (int)(ch+2) = (int)('d'*256 + 'c') = 25699

C++ Standard n3337 § 2.14.3/1

(...)An ordinary character literal that contains a single c-char has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set. An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal has type int and implementation-defined value.(...)

Upvotes: 2

Dr. Debasish Jana

Reputation: 7128

The code is not warning-free

warning: multi-character character constant [-Wmultichar]

output is:

6296232 6296233 6296234 6296235
1684234849 6579042 25699 100
1633837924 6447972 25444 100

Explanation: for 1st line, presuming pointer ch has value 6296232, it has pointer values for ch, ch+1, ch+2, ch+3 printed

for 2nd line, presuming one int is 4 bytes on 32 bit machine,

1st entry is : (int)('d'*256*256*256 + 'c'*256*256 + 'b'*256 + 'a') = 1684234849 
2nd entry is : (int)('d'*256*256 + 'c'*256 + 'b') = 6579042 
3rd entry is : (int)('d'*256 + 'c') = 25699 
4th entry is : (int)('d') = 100 (ASCII value of 'd)

for 3rd line, presuming one int is 4 bytes on 32 bit machine,

1st entry is : (int)('d' + 'c'*256 + 'b'*256*256 + 'a'*256*256*256) = 1633837924
2nd entry is : (int)('d' + 'c'*256 + 'b'*256*256) = 6447972 
3rd entry is : (int)('d' + 'c'*256) = 25444
4th entry is : (int)('d') = 100 (ASCII value of 'd)

Upvotes: 0

Matthieu M.

Reputation: 300349

Well, you are invoking undefined behavior, what do you expect is a sane answer ? ;)

The second row is invoking undefined behavior:

std::cout << *(int*)(ch+0)

is alright, because there are indeed sizeof(int) bytes worth of data at ch+0, however:

*(int*)(ch+2)

and

*(int*)(ch+3)

involve reading past the end of the array whenever sizeof(int) 4 bytes or more (and most compilers/platforms use 4 bytes).

So, why do you expect garbage after the array ? Why is it not acceptable to have bytes with a value of 0 ?

It's undefined behavior, thus anything is, by definition, acceptable. Including 0.

And thus you are reading (100, 0, 0, 0) as an integer, which is displayed as 100.

Why 100 and not 100 << 24 you ask ?

Well, this is a matter of Endianness. If your platform is little-endian, then (100, 0, 0, 0) is interpreted as 100 and if it is big-endian then (100, 0, 0, 0) is interpreted as 100 << 24.

You seem to be on a little-endian platform: all x86 and x86_64 CPUs such as Intel/AMD are little-endian.

Note: in std::cout << (long)(int*)(ch+0) the cast to long is unnecessary, ostream can display void const* and there is an implicit conversion from T* to void* so you would get the address without long.

Upvotes: 0

Pointer reference and dereference

Answers (3)

Related Questions