dr15
dr15

Reputation: 29

K&R - section 1.9: understanding character arrays (and incidentally buffers)

Let's start with a very basic question about character arrays that I could not understand from the description in the book:

Going further into the example in this section, it defines a getline() function that reads a string and counts the number of characters in it. you can see the entire code here (in this example getline() was changed to gline(), since getline() is already defined in newer stdio.h libraries)
Here's the function:

int getline(char s[], int lim) {
    int c, i;

    for (i = 0; i < lim - 1 && (c = getchar()) != EOF && c != '\n'; ++i) {
        s[i] = c;
    }

    if (c == '\n') {
        s[i] = c;
        ++i;
    }

    s[i] = '\0';
    return i;
}

It is explained that the array stores the input in this manner:
[h][e][l][l][o][\n][\0]
and the function will return a count of 6, including the '\n' char, but this is only true if the loop exits because of a '\n' char.
If the loop exits because it has reached it's limit, it will return an array like this (as I understand this):
[s][n][a][z][z][y][\0]
now the count will also be 6.
Comparing both strings will return that they're equal when clearly "snazzy" is a longer word than "hello", and so this code has a bug (by my personal requirements, as I would like to not count '\n' as part of the string).

Trying to fix this I tried (among many other things) to remove adding the '\n' char to the array and not incrementing the counter, and I found out incidentally that when entering more characters than the array could store, the extra characters wait in the input buffer, and would later be passed to the getline() function, so if I would enter:
"snazzy lolz\n"
it would use it up like this:
first getline() call: [s][n][a][z][z][y][\0]
second getline() call: [ ][l][o][l][z][\n][\0]

This change also introduced an interesting bug, if I try to enter a string that is exactly 7 characters long (including '\n') the program would quit straight away because it would pass a '\0' char to the next getline() call which would return 0 and would exit the while loop that calls getline() in main().

I am now confused as to what to do next. How can I make it not count the '\n' char but also avoid the bug it created?

Many thanks

Upvotes: 2

Views: 413

Answers (2)

Clifford
Clifford

Reputation: 93514

Does every character array end with '\0'?

No; strings are a special case - they are character arrays with a nul (\0) terminator. This is more a convention than a feature of the language, although it is part of the language in-so-far that literal constant strings have a nul terminator. Moreover in a character string, the nul appears at the end of the string, not the end of the array - the array holding the string may be longer that the string it holds.

So the nul merely indicates the end of a string in a character array. If the character array represents data other than a string, then it may contain zero elements anywhere.

Is the length of it always equal to the number of characters + 1 for '\0'?

Again you are conflating strings with character arrays. They are not the same. A string happens to use a character array as a container. A string requires an array that is at least the length of the string plus one.

meaning that if I specify a character array length of 10 I would be able to store only 9 characters that are not '\0'?

You will be able to store 10 characters of any value. If however you choose to interpret the array as a string, the string comprises only those characters up-to and including the first nul character.

or does the '\0' come after the last array slot, so all 10 slots could be used for any character and an 11th non-reachable slot would contain the '\0' char?

The nul is at the end of the string, not the end of the array, and certainly not after the end of the array.

Comparing both strings will return that they're equal when clearly "snazzy" is a longer word than "hello",

In what world are those strings equal? They have equal length, not equal content.

and so this code has a bug (by my personal requirements, as I would like to not count '\n' as part of the string).

Someone else's code not doing what you require is hardly a bug; that implementation is by design and is identical to the behaviour of the standard library fgets() function. If you require different behaviour, then you are of course free to implement to your needs; just omit the part:

if (c == '\n') {
    s[i] = c;
    ++i;
}

To explicitly flush any remaining characters in the buffer the removed code above may be replaced with:

while(c != '\n') {
    c = getchar() ;
}

One reason why you might not do that is that the data may be coming from a file redirected to stdin.

One reason for retaining the '\n' is that enables detection of incomplete input, which may be useful in some cases. For example you may want all the data in the line, regardless of length and despite a necessarily finite buffer length, a string returned without a newline would indicate that there is more day to be read, so you could then write code to handle that situation.

Upvotes: 2

Paul Ogilvie
Paul Ogilvie

Reputation: 25296

There is a convention in C that strings end with a null character. On that convention, all your questions are based. So

  • Does every character array end with '\0'?

No, It ends with \0 because the programmers put it there.

  • Is the length of it always equal to the number of characters + 1 for '\0'?

Yes, but only because of this convention. Thereto, for example you allocate one more byte (char) than the length of the string to accommodate this \0.

Strings are stored in character arrays such as char s[32]; or char *s = malloc(strlen(name) + 1);

Upvotes: 4

Related Questions