Jordan
Jordan

Reputation: 4628

C: using pointer as string: unpredictable behavior

I'm writing a C program to find the longest line in the user's input and print the line's length and the line itself. It succeeds at counting the characters but unpredictably fails at storing the line itself. Maybe I'm misunderstanding C's memory management and someone can correct me.

EDIT: followup question: I understand now that the blocks following the dummy char are unallocated and thus open range for the computer to do anything with them, but then why does the storage of some chars still work? In the second example I mention, the program stores characters in the 'unallocated' blocks even though it 'shouldn't'. Why?

Variables:

This is how I visualize the memory used by the program's variables:

 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|\n| 7|11|15|c |u |r |r |e |n |t |\0|e |s |t |\0|p |r |e |v |l |o |n |g |e |s |t |\0|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

true statements:

&c == 11
&i == 12
&longest_i == 13
&twostr = 14
&dummy = 15

program:

#include <stdio.h>

int main()
{
    char c = '\0';
    int i, longest_i;
    char *twostr;
    longest_i = i = 0;
    char dummy = '\0';
    twostr = &dummy;

    while ((c=getchar()) != EOF)
    {
        if (c != '\n')
        {
            *(twostr+i) = c;
            i++;
        }
        else
        {
            *(twostr+i) = '\0';
            if (i > longest_i)
            {
                longest_i = i;
                for (i=0; (c=*(twostr+i)) != '\0'; ++i)
                    *(twostr+longest_i+1+i) = c;
            }
            i = 0;
        }
    }

    printf("length is %d\n", longest_i);
    for (i=0; (c=*(twostr+longest_i+1+i)) != '\0'; ++i)
        putchar(c);

    return 0;
}

From *(twostr+longest_i+1)) until '\0' is unpredictable. Examples:

input:

longer line
line

output:

length is 11
@

input:

this is a line
this is a longer line
shorter line

output:

length is 21
this is a longer lineÔÿ"

Upvotes: 2

Views: 438

Answers (7)

DRH
DRH

Reputation: 8358

First, you will need to make sure that twostr has sufficient space to hold the string the string that you're managing. You will likely need to add some additional logic to allocate initial space as well as to allocate additional space when needed. Something like:

size_t twostrLen = 256;
char* twostr = malloc(twostrLen);

Then inserting data into this, you'll need to make sure you allocate additional memory if your index will exceed the current length of twostrLen:

if (i >= twostrLen) {
   char* tmp = twostr;
   twostrLen *= 2;
   twostr = malloc(twostrLen);
   memcpy(twostr, tmp, i-1);
   free(tmp);
}

Where i is the offset from twostr that you're about to write to.

Finally, when copying from the current string to the longest string, your loop termination condition is c=*(twostr+i)) != '\0'. This will trigger when c matches '\0', exiting the loop before the terminating null is written. You'll need to make sure the null is written in order for your loop to print the string will work correctly. Adding the following after your inner-most for loop should address the issue:

*(twostr+longest_i+1+i) = 0;

Without this, our last loop will continue to read until a null character is encountered. This could be immediately (as seen in your first example where it appears to work), or could be some number of bytes later (like your second example, where additional characters are printed).

Again, remember to check that longest_i+1+i < twostrLen before writing to that location.

Upvotes: 1

enam
enam

Reputation: 1177

Try the following code. Hope you will get your expected result:

#include <stdio.h>

#define LENGTH 1024

int main()
{
    char c;
    int i, longest_i;
    char twostr[LENGTH]=""; // twostr points to a block of memory 1024 bytes long
    char longest[LENGTH]=""; // so does longest, where we will store the longest string
longest_i = i = 0;
char dummy = '\0';

while ((c=getchar()) != EOF && i < LENGTH) // we check that i < 1024 so we don't
                                         // go outside the bounds of our arrays
{
    if (c != '\n')
    {
        *(twostr+i) = c;
        i++;
    }
    else
    {
        twostr[i] = 0;
        if (i > longest_i)
        {
            longest_i = i;
            for (i = 0; twostr[i] != 0; ++i) { // 0 is the same as '\0'
                longest[i] = twostr[i];
                twostr[i] = 0; // fill twostr with NULLs
            }
        }
        i = 0;
    }
}

printf("length is: %d\n", longest_i);
printf("And the word is: ");
puts(longest);
printf("\n");
return 0;
}

Upvotes: 1

Seth Carnegie
Seth Carnegie

Reputation: 75130

Yes, you are correct in saying that you are misunderstanding C's memory management model.

In the line

*(twostr+i) = c;

for example, this would be right except for the fact that twostr contains the address of a character and only *twostr points to memory that you own. Adding anything to it except 0 to get another address and dereferencing that produces undefined behaviour because the size of the memory that belongs to dummy is 1 byte.

So to make a long story short, you need to allocate a chunk of memory to store the string in. It's easiest just to show you how to do it right, so here is the code with corrections made:

#include <stdio.h>

int main()
{
    char c;
    int i, longest_i;
    char twostr[1024]; // twostr points to a block of memory 1024 bytes long
    char longest[1024]; // so does longest, where we will store the longest string

    longest_i = i = 0;
    char dummy = '\0';

    while ((c=getchar()) != EOF && i < 1024) // we check that i < 1024 so we don't
                                             // go outside the bounds of our arrays
    {
        if (c != '\n')
        {
            *(twostr+i) = c;
            i++;
        }
        else
        {
            twostr[i] = 0;
            if (i > longest_i)
            {
                longest_i = i;
                for (i = 0; twostr[i] != 0; ++i) { // 0 is the same as '\0'
                    longest[i] = twostr[i];
                    twostr[i] = 0; // fill twostr with NULLs
                }
            }
            i = 0;
        }
    }

    printf("length is %d\n", longest_i);
    for (i=0; longest[i] != 0; ++i)
        putchar(longest[i]);

    return 0;
}

Furthermore, the way you visualise your program's variables is incorrect. It would really be something like this:

Stack:

+---------+
|    c    |   1 byte
+---------+
|         |
|         |
|         |
|    i    |   4 bytes
+---------+
|         |
|         |
|         |
|longest_i|   4 bytes
+---------+
|         |
|         |
|         |

~~~~~~~~~~~

|         |
|         |
|  twostr |   1024 bytes
+---------+
|         |
|         |
|         |

~~~~~~~~~~~

|         |
|         |
| longest |   1024 bytes
+---------+

Upvotes: 2

codaddict
codaddict

Reputation: 455142

You are not allocating memory to store the characters read by getchar. Your pointer twostr is a character pointer pointing to a character variable not an array, but you are treating it as a pointer to char array:

char *twostr;
....
char dummy = '\0';
twostr = &dummy;
....
*(twostr+i) = c;  // when i here is > 0 you are accessing invalid memory.

What you need is something like:

char *twostr = malloc(MAX);
// use it.
free(twostr);

Where MAX is defined to be one more than the max length of the string in user input.

Upvotes: 2

AndersK
AndersK

Reputation: 36082

twostr points to a character, however you are treating as a buffer.

what you need to do is to make a buffer instead with can hold more characters

e.g.

static char dummy[512];
twostr = dummy;

Upvotes: 1

Lloyd Macrohon
Lloyd Macrohon

Reputation: 1462

You're smashing your stack. You only have 1 byte allocated for char dummy. Really it should be something like:

char dummy[1024];

You also need to make sure you don't write more than 1024 or 1023 bytes to allow for the null terminator.

Upvotes: 2

Matt Lacey
Matt Lacey

Reputation: 8255

You're not actually allocating any memory to write into!

char dummy = '\0'; // creates a char variable and puts \0 into it
twostr = &dummy; // sets twostr to point to the address of dummy

After this, you're simply writing into the memory which comes after the char set aside by dummy, and writing over who-knows-what.

The easiest fix in this case would be to make dummy a pointer to a char, and then malloc a buffer to use for your strings (make it longer than the longest string you expect!)

For instance, buffer below would point to 256 bytes (on most systems) of memory, allowing for a string up to 255 characters long (as you have the null terminator (\0) to store at the end).

char * buffer = (char *)malloc(sizeof(char) * 256);

Edit: This would allocate memory from the heap, which you should later free up by calling free(buffer); when you're done with it. The alternative is to use up space on the stack as per Anders K's solution.

Upvotes: 4

Related Questions