kaiseroskilo
kaiseroskilo

Reputation: 1729

Strings without a '\0' char?

If by mistake,I define a char array with no \0 as its last character, what happens then?

I'm asking this because I noticed that if I try to iterate through the array with while(cnt!='\0'), where cnt is an int variable used as an index to the array, and simultaneously print the cnt values to monitor what's happening the iteration stops at the last character +2.The extra characters are of course random but I can't get it why it has to stop after 2.Does the compiler automatically inserts a \0 character? Links to relevant documentation would be appreciated.

To make it clear I give an example. Let's say that the array str contains the word doh(with no '\0'). Printing the cnt variable at every loop would give me this: doh+ or doh^ and so on.

Upvotes: 8

Views: 7392

Answers (6)

pmg
pmg

Reputation: 108986

EDIT (undefined behaviour)

Accessing array elements outside of the array boundaries is undefined behaviour.
Calling string functions with anything other than a C string is undefined behaviour.
Don't do it!

A C string is a sequence of bytes terminated by and including a '\0' (NUL terminator). All the bytes must belong to the same object.


Anyway, what you see is a coincidence!

But it might happen like this

                        ,------------------ garbage
                        | ,---------------- str[cnt] (when cnt == 4, no bounds-checking)
memory ----> [...|d|o|h|*|0|0|0|4|...]
                  |   |   \_____/  -------- cnt (big-endian, properly 4-byte aligned)
                  \___/  ------------------ str

Upvotes: 9

thkala
thkala

Reputation: 86443

As far as most string-handling functions are concerned, strings always stop at a '\0' character. If you miss this null-terminator somewhere, one of three things will usually happen:

  • Your program will continue reading past the end of the string until it finds a '\0' that just happened to be there. There are several ways for such a character to be there, but none of them is usually predictable beforehand: it could be part of another variable, part of the executable code or even part of a larger string that was previously stored in the same buffer. Of course by the time that happens, the program may have processed a significant amount of garbage. If you see lots of garbage produced by a printf(), an unterminated string is a common cause.

  • Your program will continue reading past the end of the string until it tries to read an address outside its address space, causing a memory error (e.g. the dreaded "Segmentation fault" in Linux systems).

  • Your program will run out of space when copying over the string and will, again, cause a memory error.

And, no, the C compiler will not normally do anything but what you specify in your program - for example it won't terminate a string on its own. This is what makes C so powerful and also so hard to code for.

Upvotes: 3

AnT stands with Russia
AnT stands with Russia

Reputation: 320757

In C language the term string refers to a zero-terminated array of characters. So, pedantically speaking there's no such thing as "strings without a '\0' char". If it is not zero-terminated, it is not a string.

Now, there's nothing wrong with having a mere array of characters without any zeros in it, as long as you understand that it is not a string. If you ever attempt to work with such character array as if it is a string, the behavior of your program is undefined. Anything can happen. It might appear to "work" for some magical reasons. Or it might crash all the time. It doesn't really matter what such a program will actually do, since if the behavior is undefined, the program is useless.

Upvotes: 5

mouviciel
mouviciel

Reputation: 67879

I bet that an int is defined just after your string and that this int takes only small values such that at least one byte is 0.

Upvotes: 0

poundifdef
poundifdef

Reputation: 19380

If you define a char array without the terminating \0 (called a "null terminator"), then your string, well, won't have that terminator. You would do that like so:

char strings[] = {'h', 'e', 'l', 'l', 'o'};

The compiler never automatically inserts a null terminator in this case. The fact that your code stops after "+2" is a coincidence; it could just as easily stopped at +50 or anywhere else, depending on whether there happened to be \0 character in the memory following your string.

If you define a string as:

char strings[] = "hello";

Then that will indeed be null-terminated. When you use quotation marks like that in C, then even though you can't physically see it in the text editor, there is a null terminator at the end of the string.

There are some C string-related functions that will automatically append a null-terminator. This isn't something the compiler does, but part of the function's specification itself. For example, strncat(), which concatenates one string to another, will add the null terminator at the end.

However, if one of the strings you use doesn't already have that terminator, then that function will not know where the string ends and you'll end up with garbage values (or a segmentation fault.)

Upvotes: 5

SLaks
SLaks

Reputation: 888195

This would happen if, by coincidence, the byte at *(str + 5) is 0 (as a number, not ASCII)

Upvotes: 3

Related Questions