Reputation: 4237

Understanding Endianness - a variable value

I'm using a piece of code (found else where on this site) that checks endianness at runtime.

static bool isLittleEndian()
{
  short int number = 0x1;
  char *numPtr = (char*)&number;

  std::cout << numPtr << std::endl;
  std::cout << *numPtr << std::endl;

  return (numPtr[0] == 1);
}

When in debug mode, the value numPtr looks like this: 0x7fffffffe6ee "\001"

I assume the first hexadecimal part is the pointer's memory address, and the second part is the value it holds. I'm know that \0 is null termination in old-style C++, but why is it at the front? Is it to do with endianness?
On a little-endian machine: 01 the first byte and therefore least significant (byte place 0), and \0 the second byte/final byte (byte place 1)?

In addition, the cout statements do not print the pointer address or it's value. Reasons for this?

Upvotes: 3

Answers (7)

James Kanze

Reputation: 154047

For starters: this type of function is totally worthless: on a machine where sizeof(int) is 4, there are 24 possible byte orders. Most, of course, don't make sense, but I've seen at least three. And endianness isn't the only thing which affects integer representation. If you have an int, and you want to get the low order 8 bits, use intValue & 0xFF, for the next 8 bits, (intValue >> 8) & 0xFF.

With regards to your precise question: I presume what you are describing as "looks like this" is what you see in the debugger, when you break at the return. In this case, numPtr is a char* (a unsigned char const* would make more sense), so the debugger assumes a C style string. The 0x7fffffffe6ee is the address; what follows is what the compiler sees as a C style string, which it displays as a string, i.e. "...". Presumably, your platform is a traditional little-endian (Intel); the pointer to the C style string sees the sequence (numeric values) of 1, 0. The 0 is of course the equivalent of '\0', so it considers this a one character string, with that one character having the encoding of 1. There is no printable character with an encoding of one, and it doesn't correspond to any of the normal escape sequences (e.g. '\n', '\t', etc.) either. So the debugger outputs it using the octal escape sequence, a '\' followed by 1 to 3 octal digits. (The traditional '\0' is just a special case of this; a '\' followed by a single octal digit.) And it outputs 3 digits, because (probably) it doesn't want to look ahead to ensure that the next character isn't an octal digit. (If the sequence were the two bytes 1, 49, for example, 49 is '1' in the usual encodings, and if it output only a single byte for the octal encoding of 1, the results would be "\11", which is a single character string—corresponding in the usual encodings to '\t'.) So you get " this is a string, \001 with first character having an encoding of 1 (and no displayable representation), and " that's the end of the string.

Upvotes: 1

Benjamin Lindley

Reputation: 103751

In addition, the cout statements do not print the pointer address or it's value. Reasons for this?

Because chars and char pointers are treated differently than integers when it comes to printing.

When you print a char, it prints the character from whatever character set is being used. Usually, this is ASCII, or some superset of ASCII. The value 0x1 in ASCII is non-printing.

When you print a char pointer, it doesn't print the address, it prints it as a null-terminated string.

To get the results you desire, cast your char pointer to a void pointer, and cast your char to an int.

std::cout << (void*)numPtr << std::endl;
std::cout << (int)*numPtr << std::endl;

Upvotes: 0

Lindydancer

Reputation: 26164

The others have given you a clear answer to what "\000" means, so this is an answer to your question:

On a little-endian machine: 01 the first byte and therefore least significant (byte place 0), and \0 the second byte/final byte (byte place 1)?

Yes, this is correct. Of you look at value like 0x1234, it consists of two bytes, the high part 0x12 and the low part 0x34. The term "little endian" means that the low part is stored first in memory:

addr:   0x34
addr+1: 0x12

Did you known that the term "endian" predated the computer industry? It was originally used by Jonathan Swift in his book Gulliver's Travels, where it described if people were eating the egg from the pointy or the round end.

Upvotes: 2

Kyle Jones

Reputation: 5532

The \0 isn't a NUL, the debugger is showing you numPtr as a string, the first character of which is \001 or control-A in ASCII. The second character is \000, which isn't displayed because NULs aren't shown when displaying strings. The two character string version of 'number' would appear as "\000\001" on a big-endian machine, instead of "\001\000" as it appears on little-endian machines.

Upvotes: 0

Wyzard

Reputation: 34581

That's not a \0 followed by "01", it's the single character \001, which represents the number 1 in octal. That's the only byte "in" your string. There's another byte after it with the value zero, but you don't see that since it's treated as the string terminator.

Upvotes: 1

Some programmer dude

Reputation: 409472

The "\001" you are seeing is just one byte. It's probably octal notation, which needs three digits to properly express the (decimal) values 0 to 255.

Upvotes: 0

Eugen Rieck

Reputation: 65342

the easiest way to check for endianness is to let the system do it for you:

if (htonl(0xFFFF0000)==0xFFFF0000) printf("Big endian");
else printf("Little endian");

Upvotes: 1

Understanding Endianness - a variable value

Answers (7)

Related Questions