user8424470
user8424470

Reputation:

Confusion about the necessity of the null-character?

I am reading about why exactly there is a need for null-characters, and then I found this answer which made somewhat sense to me. It states that it is needed because that char arrays (for the C strings) are often allocated much larger than the actual strings and you thereby need a a way to symbolize the end.

But why aren't these array not just constructed with a size deduction based on the initializer (without the null-character that actually is implicitly added when assigning directly to string literals). Like, if the arrays holding the strings are constructed using size deduction, there would not be a need for the null-character because the array was not any bigger than the string, so of course, it would end at the end of that array.

Upvotes: 0

Views: 137

Answers (5)

eerorika
eerorika

Reputation: 238351

I am reading about why exactly there is a need for null-characters, and then I found this answer which made somewhat sense to me. It states that it is needed because that char arrays (for the C strings) are often allocated much larger than the actual strings and you thereby need a a way to symbolize the end.

The answer is misleading. That's not really the reason for why null termination is needed. The accepted answer with more upvotes is better.

there would not be a need for the null-character because the array was not any bigger than the string, so of course, it would end at the end of that array.

Let us remind ourselves, that we cannot use arrays as function arguments. Even if we could, we wouldn't want to, because it would be slow to copy an entire array into the argument.

Therefore, there is a need to refer to an array indirectly. Indirection is commonly achieved using pointers (or references). Now, we could have a "pointer to character array of size 42", but that is not very useful because then the argument can only point to strings of one particular size.

Instead, the common approach is to use a pointer to the first element of the array. This is so common pattern that the language has a rule that allows the name of the array to implicitly decay into the pointer to first element.

But can you tell how big an array is, based on a pointer to an element of that array? You cannot. You need extra information. The accepted answer of the linked question explains the options that are available for representing the size, and that the designer of C chose the option that uses a terminating character (which was already the convention used by the BCPL language which C is based on).


TL;DR Size information is needed because there is a need to refer to the string indirectly, and that indirection hides the knowledge about the size of the array. Null termination is one way to encode the size information within the content of the string, and it is the way that was chosen by the designer of the C language.

Upvotes: 1

Pete Becker
Pete Becker

Reputation: 76305

Strings are often manipulated by creating a char array to hold intermediate results and modifying its contents:

char buffer[128];
strcpy(buffer, "Hello, ");
strcat(buffer, "world");
std::cout << buffer << '\n';

After the call to strcpy the buffer has 7 characters that we care about; after the call to strcat it has 12. So the number of characters in the buffer can change, and we need to have a way of indicating how many characters there are that matter. One convention is to put a character count in the first location in the array, and the actual characters after that. Another convention is to put a marker at the end of the characters that matter. There are tradeoffs here, but the decision in C, which was carried through into C++ was to go with an end marker.

Upvotes: 0

Useless
Useless

Reputation: 67733

... because that char arrays ... are often allocated much larger than the actual strings

That answer is awful.

C strings can be dynamically allocated, meaning you don't know, before runtime, how long they should be. Instead of pre-allocating a massive array and filling most of it with zeroes, you can just malloc(required_size+1) and stick a single nul character at the end.

Conversely, string literals which are known at compile time, are definitely not "allocated much larger than the actual strings". there wouldn't be any point, since you know exactly how much space is needed in advance.

But why aren't these array not just constructed with a size deduction based on the initializer

size_t expected;
if (read(fd, &expected, sizeof(expected)) == sizeof(expected)) {
  char *buf = malloc(expected + 1);
  if (buf && read(fd, buf, expected) == expected) {
    buf[expected] = 0;
    /* now do something with buf */
  }
}

there you go, a dynamically-sized string. What would your "size deduction" be? What is the "initializer"?

I could have written a less-ugly example using std::string, since the question is tagged C++, but it's actually C strings you're specifically asking about, and it doesn't make any real difference.

Upvotes: 0

Jean-Baptiste Yun&#232;s
Jean-Baptiste Yun&#232;s

Reputation: 36401

But why aren't these array not just constructed with a size deduction based on the initializer (without the null-character that actually is implicitly added when assigning directly to string literals).

I suppose you mean why you can't write:

char t[] = "abracadabra";

and the compiler would deduce a size of 11?

Because you have 12 characters and not 11. If the array would have size 11, then something would be lost: the byte used to contains the NUL would not have been referenced and compiler wouldn't make a difference in between:

char t[] = "abracadabra"; // an array deduced from a C-string literal

and

char t[11] = { 'a', 'b', 'r', 'a', 'c', 'a', 'b', 'r', 'a' }; // a "real" array not a C-string!

The first would have to release 12 bytes at the end of scope and the second 11.

Historically arrays are just some kind a syntactic sugar on top of pointers arithmetic.

Upvotes: 0

Yury Schkatula
Yury Schkatula

Reputation: 5369

Historically, string arrays are provided with termination symbol(s). Reason is simple: instead of sending two values (head of the array and array length) you just need to pass just one value, head of the array. This simplifies calling signature but places some requirements for caller.

In C/C++ itself, null character is a termination symbol so all runtime functions do work with intention that very first null char they can meet is a line end. Same time, in terms of applied logic, terminal symbol(s) may be different: for example, in HTTP headers there is a CR-LF-CR-LF sequence that marks a end-of-the-header and single CR-LF sequence is just a start-of-next-line.

Upvotes: 0

Related Questions