Andrew-Dufresne
Andrew-Dufresne

Reputation: 5624

string doesn't end at NULL but still behaves normally, why?

In the following code, I copy a string in to a char* str, which is 10 characters long, using strncpy().

Now according to strncpy() manual, "Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null terminated. " which is exactly what happens here.

The source string is 26 charcters long and I have copied 10 characters, hence no null character is placed at then end of str.

But when I print the contents of str, starting from 0 until I get '\0', it behaves normally.

Why? When there is no '\0' placed at the end then why does the loop stop at the correct place?

What I understand is that it should give "Segmentation fault" or at least it shouldn't stop there and keep printing some garbage values.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 10

int main()
{
    char *str ;
    str = malloc( sizeof( char ) * SIZE );
    if( str == NULL ) 
        exit( 1 );
    memset( str, 0, sizeof( char ) * SIZE );

    strncpy( str, "abcdefghijklmnopqrstuvwxyz", sizeof( char ) * SIZE );

    unsigned int index;
    for( index = 0; str[ index ] != '\0' ; index++ ) {
        printf( "str[ %u ] has got : %c \n ", index, str[ index ] );
    }

    return 0;
}

Here is the output :

 str[ 0 ] has got : a
 str[ 1 ] has got : b
 str[ 2 ] has got : c
 str[ 3 ] has got : d
 str[ 4 ] has got : e
 str[ 5 ] has got : f
 str[ 6 ] has got : g
 str[ 7 ] has got : h
 str[ 8 ] has got : i
 str[ 9 ] has got : j

Any help will be appreciated.

EDIT

Is there a proper way to check whether a string ends at '\0' or not? I always thought the above loop to be the ultimate test, but now it seems it isn't.

Lets say we get a string from some function developed by other programmer. Now how will we know that it ends at correct place with '\0'. May be it doesn't, then it will go beyond the actual size until we get some '\0'. We can never know the actual size of the string.

So how do we tackle such situation?

Any suggestion?

Upvotes: 6

Views: 4309

Answers (6)

Yantao Xie
Yantao Xie

Reputation: 12896

I think sharptooth's answer is right. There are more space allocated. I modify the program as follow:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 10

int main()
{
    char *str ;
    int *p;
    int actual_length;
    str = malloc( sizeof( char ) * SIZE );
    if( str == NULL ) 
        exit( 1 );

    actual_length = (int)*(str - 4) - 1 - 4;
    printf("actual length of str is %d\n", actual_length);
    p = (int*) malloc(sizeof(int));
    if (p == NULL) exit(1);
    *p = -1;
    char* pc = (char*)(p - 1);
    pc [0] = 'z';
    pc [1] = 'z';
    pc [2] = 'z';
    pc [3] = 'z';

    memset( str, 0, sizeof( char ) * SIZE );

    memcpy( str, "abcdefghijklmnopqrstuvwxyz", sizeof( char ) * SIZE );

    int i;
    for (i = SIZE; i < actual_length; i++)
     str[i] = 'y';

    unsigned int index;
    for( index = 0; str[ index ] != '\0' ; index++ ) {
        printf( "str[ %u ] has got : %c \n ", index, str[ index ] );
    }

    return 0;
}

The output is

actual length of str is 12
str[ 0 ] has got : a 
 str[ 1 ] has got : b 
 str[ 2 ] has got : c 
 str[ 3 ] has got : d 
 str[ 4 ] has got : e 
 str[ 5 ] has got : f 
 str[ 6 ] has got : g 
 str[ 7 ] has got : h 
 str[ 8 ] has got : i 
 str[ 9 ] has got : j 
 str[ 10 ] has got : y 
 str[ 11 ] has got : y 
 str[ 12 ] has got : z 
 str[ 13 ] has got : z 
 str[ 14 ] has got : z 
 str[ 15 ] has got : z 
 str[ 16 ] has got : \377 
 str[ 17 ] has got : \377 
 str[ 18 ] has got : \377 
 str[ 19 ] has got : \377 

My OS is Debian Squeeze/sid.

Upvotes: 0

HMage
HMage

Reputation: 1591

You're lucky to have zero beyond allocated region of space.

Try this code on all another platforms and you'll see it might not behave the same way.

Upvotes: 0

Falaina
Falaina

Reputation: 6685

As for your edit, I think being pedantic will help elucidate some issues.

In C there is no such thing as a string. There is a concept of a "C string" which is what the C standard library works with which is defined as nothing more than a NUL terminated sequence of characters, so there really isn't such a thing as a "non-null terminated string" in C. So your question is better phrased as "How can I determine if an arbitrary character buffer is a valid C string?" or "How can I determine if the string I found is the intended string"

The answer to the first question,unfortunately,is to just to linearly scan the buffer until you encounter a NUL byte as you are doing. This will give you the length of the C string.

The second question has no easy answer. Due to the fact that C doesn't have an actual string type with length metadata (or the ability to carry around the size of arrays across function calls), there's no real way to determine if the string length we determined above is the length of the intended string. It might be obvious if we start seeing segfaults in the program or "garbage" in the output, but in general we're stuck doing string operations by scanning until the first NUL byte (usually with an upperbound on string length so as to avoid messy buffer overrun errors)

Upvotes: 6

gnud
gnud

Reputation: 78538

Sharptooth has explained the probable cause of the behaviour, so I'm not gonna repeat that.

When allocating buffers, I always over-allocate by a byte, like this:

#define SIZE 10
char* buf = malloc(sizeof(char)*(SIZE+1));
/* error-check the malloc call here */
buf[SIZE] = '\0';

Upvotes: 0

sharptooth
sharptooth

Reputation: 170489

It just happens that there's a null byte right beyond the end of allocated block.

Most likely malloc() allocates more memory and puts so-called guard values that happen to contain null bytes or it puts some metadata to be used by free() later and this metadata happens to contain a null byte right at that position.

Anyway you should not rely on this behaviour. You have to request (malloc()) one more byte for the null character so that the null character location is also legally allocated to you.

There's no portable way to test if a string is null-terminated properly. It can happen that once you're past the end of allocated block your program will just crash. Or it can happen that there is a null character somewhere beyond the end of block and you overwrite memory beyond the end of block later when manipulating the misinterpreted string.

Ideally you need some function that would check if a given address is allocated to you and belongs to the same allocation as another given address (perhaps start of the block). This would be slow and not worth it and there's no standard way for doing this.

In other words, if you encounter a string which is meant to be null-terminated but really isn't you're screwed big time - your program will run into undefined behaviour.

Upvotes: 15

sbi
sbi

Reputation: 224089

Why does it work?

The memory you allocate happens to have a '\0' byte at the right place. (For example, if you're using Visual C++ in Debug mode, the heap manager zeros allocated memory before it hands it out to your program. But it could just as well be pure luck.)

Is there a proper way to check whether a string ends at '\0' or not?

No. You need your strings to be either zero-terminated (which is what C std lib string handling functions expect) or you need to carry around their length in an extra variable. If you have neither of the two, you have a bug.

Now how will we know that some string from some function developed by some other programmer ends at correct place with '\0'. May be it doesn't, then it will go beyond the actual size until we get some '\0'. We can never know the actual size of the string.

So how do we tackle such situation?

You can't. If the other function screws it that bad, you're screwed that bad.

Upvotes: 4

Related Questions