SilentDev
SilentDev

Reputation: 22747

String not properly being emptied and assigned when dealing with strcpy(string, "")

Edit: I did try to change the line arr_of_strings[arr_index_count] = first_word; to strcpy(arr_of_strings[arr_index_count], first_word); but then it gives a segmentation fault after printing Word is: This

Edit 2: I am trying to do this without strtok since I figured it would be a good way to learn about C strings.

Trying to learn C on my own. Decided to create a function which takes a string, and places each word in the string into an element in an array. Here is my code:

Assume #define MAX_LENGTH = 80

// char *string_one[unknown_size];

// first_word will represent each word in the sentence
char first_word[MAX_LENGTH + 1] = "";

// this is the array I will store each word in
char *arr_of_strings[MAX_LENGTH];

int index_count = 0;
int arr_index_count = 0;

char sentence[] = "This is a sentence.";

for (int i = 0; i<MAX_LENGTH; i++) {
    printf("Dealing with char: %c\n", sentence[i]); 

    if (sentence[i] == '\0') {
        // end of sentence
        break;
    } else if (sentence[i] ==  ' ') {
        // this signifies the end of a word
        printf("Word is: %s\n", first_word);
        arr_of_strings[arr_index_count] = first_word;
        // after putting the word in the string, make the word empty again
        strcpy(first_word, "");
        // verify that it is empty
        printf("First word is now: %s\n", first_word);

        index_count = 0;
        arr_index_count++;
    } else {
        // not the start of a new string... so keep appending the letter to first_word
        printf("Letter to put in first_word is: %c\n", sentence[i]);
        first_word[index_count] = sentence[i];
        index_count++;
    }
}

printf("-----------------\n");
for (int j = 0; j<=arr_index_count; j++) {
    printf("%s\n", arr_of_strings[j]);
}

What this prints is:

Dealing with char: T
Letter to put in first_word is: T
Dealing with char: h
Letter to put in first_word is: h
Dealing with char: i
Letter to put in first_word is: i
Dealing with char: s
Letter to put in first_word is: s
Dealing with char:  
Word is: This
First word is now: 
Dealing with char: i
Letter to put in first_word is: i
Dealing with char: s
Letter to put in first_word is: s
Dealing with char:  
Word is: isis
First word is now: 
Dealing with char: a
Letter to put in first_word is: a
Dealing with char:  
Word is: asis
First word is now: 
Dealing with char: s
Letter to put in first_word is: s
Dealing with char: e
Letter to put in first_word is: e
Dealing with char: n
Letter to put in first_word is: n
Dealing with char: t
Letter to put in first_word is: t
Dealing with char: e
Letter to put in first_word is: e
Dealing with char: n
Letter to put in first_word is: n
Dealing with char: c
Letter to put in first_word is: c
Dealing with char: e
Letter to put in first_word is: e
Dealing with char: .
Letter to put in first_word is: .
Dealing with char: 
-----------------
sentence.
sentence.
sentence.

If we look here:

First word is now: 
Dealing with char: i
Letter to put in first_word is: i
Dealing with char: s
Letter to put in first_word is: s
Dealing with char:  
Word is: isis
  1. How come, when word is empty, and we put i and s into it, word is now isis? (same with asis).

  2. How come the word sentence is printed 3 times? My algorithm is clearly flawed, but if anything, shouldn't the word sentence be printed 4 times (once for each word in the sentence: This is a sentence)?

Additionally, I'm just learning C so if there are any other ways to improve the algorithm, please let me know.

Upvotes: 2

Views: 89

Answers (4)

gsamaras
gsamaras

Reputation: 73366

Based on my strtok-free answer, I wrote some code that uses an array of N char pointers, instead of a hardcoded 2D matrix.

char matrix[N][LEN] is a 2D array, capable of storing up to N strings, where every string can have LEN as its max length. char *ptr_arr[N] is an array of N char pointers. So it can store up to N strings, but the length of every string is not defined.

Current approach allows to save us some space, by allocating exactly as much memory as needed for every string. With a hardcoded 2D array, you would use the same memory for any string; So if you assumed that the length of a string can be 20, then you would allocate a memory block of size 20, regardless of the string you are storing, which could be much smaller in size than 20, or - even worse - much bigger. In the later case, you would need to either cutoff the string, or if the code is not written carefully, invoke Undefined Behavior, by going out of bounds of the array storing the string.

With the pointers' approach we do not need to worry about that, and can allocate as much space we need for every string, but as always, a trade-off exists. We are able to do that and save some space, but we need to dynamically allocate memory (and when done with it, de-allocate it; there is no garbage collector in C, like in Java for example). Dynamically allocation is a powerful tool, but requires us to spend more development time.

So, in my example, we will follow the same logic as before (regarding how we find a word from the string, etc.), but we will be careful about storing the words in the matrix.

Once a word is found and stored in the temporary array word, we are able to find out the exact length of the word, by using strlen(). We will dynamically allocate exactly as much space as the length of the word suggests, plus 1 for the null-terminator, that all C strings should have (since <string.h> depends on that to find the end of a string).

As a result, for storing the first word, "Alexander", we would need to do:

ptr_arr[0] = malloc(sizeof(char) * (9 + 1));

where 9 is the result of strlen("Alexander"). Notice that we ask for a memory block of size equal to the size of a char, times 10. The size of a char is 1, thus in this case it doesn't make any change, but in general you should use that (since yo might want to have other data types or even structs, etc.).

We make the first pointer of the array point to that memory block we just dynamically allocated. Now this memory block belongs to us, and thus allows us to store data inside it (in our case the word). We do that with strcpy().

Afterwards we go on and print the words.

Now we are done and in Python for example, you would finish with writing code for your program. But now, since we dynamically allocated memory, we need to free() it! That's a common mistake people do; forget freeing the memory they asked for!

We do that by freeing every pointer that points to memory returned by malloc(). So if we called malloc() 10 times, then free() should be called 10 times as well - otherwise a memory leak should occur!

Enough talking, here is the code:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

#define N 100

int fill(char* ptr_arr[N], char* data)
{
    // How many words in 'data'?
    int counter = 0;
    // Array to store current word, assuming max length will be 50
    char word[50];
    // Counter 'i' for 'word'
    int i;
    // Wihle there is still something to read from 'data'
    while(*data != '\0')
    {
        // We seek a new word
        i = 0;
        // While the current character of 'data' is not a whitespace or a null-terminator
        while(*data != ' ' && *data != '\0')
            // copy that character to word, and increment 'i'. Move to the next character of 'data'.
            word[i++] = *data++;
        // Null-terminate 'word'. 'i' is already at the value we desire, from the line above.
        word[i] = '\0';
        // If the current of 'data' is not a null-terminator (thus it's a whitespace)
        if(*data != '\0')
            // Increment the pointer, so that we skip the whitespace (and be ready to read the next word)
            data++;
        // Dynamically allocate space for a word of length `strlen(word)`
        // plus 1 for the null terminator. Assign that memory chunk to the
        // pointer positioned at `ptr_arr[counter]`.
        ptr_arr[counter] = malloc(sizeof(char) * (strlen(word) + 1));
        // Now, `ptr_arr[counter]` points to a memory block, that will
        // store the current word.

        // Copy the word to the counter-th row of the ptr_arr, and increment the counter
        strcpy(ptr_arr[counter++], word);
    }

    return counter;
}

void print(char* matrix[N], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       printf("%s\n", matrix[i]);
}

void free_matrix(char* matrix[N], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       free(matrix[i]);
}

int main(void)
{
    char data[] = "Alexander the Great";
    // We will store each word of 'data' to a matrix, of 'N' rows and 'LEN' columns
    char *matrix[N];
    int words_no;
    // 'fill()' populates 'matrix' with 'data' and returns the number of words contained in 'data'.
    words_no = fill(matrix, data);
    print(matrix, words_no);
    free_matrix(matrix, words_no);
    return 0;
}

Output:

Alexander
the
Great

Upvotes: 1

gsamaras
gsamaras

Reputation: 73366

Trying to do this without strtok since I figured it would be a good way to learn about C strings.

Yes, That's the spirit!


I already explained some problems of your code in my previous answer, so now I am going to post an strtok-free solution, which will sure help you understand what's going on with strings. Basic pointer-arithmetic will be used.

Pro-tip: Use a piece of paper and draw the arrays (data and matrix), keeping an eye for the values of their counters, and run the program in that paper.

Code:

#include <string.h>
#include <stdio.h>

#define N 100
#define LEN 20 // max length of a word

int fill(char matrix[N][LEN], char* data)
{
    // How many words in 'data'?
    int counter = 0;
    // Array to store current word
    char word[LEN];
    // Counter 'i' for 'word'
    int i;
    // Wihle there is still something to read from 'data'
    while(*data != '\0')
    {
        // We seek a new word
        i = 0;
        // While the current character of 'data' is not a whitespace or a null-terminator
        while(*data != ' ' && *data != '\0')
            // copy that character to word, and increment 'i'. Move to the next character of 'data'.
            word[i++] = *data++;
        // Null-terminate 'word'. 'i' is already at the value we desire, from the line above.
        word[i] = '\0';
        // If the current of 'data' is not a null-terminator (thus it's a whitespace)
        if(*data != '\0')
            // Increment the pointer, so that we skip the whitespace (and be ready to read the next word)
            data++;
        // Copy the word to the counter-th row of the matrix, and increment the counter
        strcpy(matrix[counter++], word);
    }

    return counter;
}

void print(char matrix[N][LEN], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       printf("%s\n", matrix[i]);
}

int main(void)
{
    char data[] = "Alexander the Great";
    // We will store each word of 'data' to a matrix, of 'N' rows and 'LEN' columns
    char matrix[N][LEN] = {0};
    int words_no;
    // 'fill()' populates 'matrix' with 'data' and returns the number of words contained in 'data'.
    words_no = fill(matrix, data);
    print(matrix, words_no);
    return 0;
}

Output:

Alexander
the
Great

The gist of the code lies in the function fill(), which takes data and:

  1. Finds a word.
  2. Stores that word character by character to an array called word.
  3. Copies that word to the matrix.

The tricky part is to find the word. You need to iterate over the string and stop when you meet a whitespace, that signals us that every character we read in that iteration, is in fact the letters of a word.

However, you need to be careful when searching for the last word of the string, because you will not meet a whitespace, when you reach that point. For that reason, you should be careful for reaching the end of the string; in other words: The null terminator.

When you do so, copy that last word in the matrix and you are done, but make sure to update the pointer correctly (this is where the paper idea I gave you will help a lot in understanding).

Upvotes: 1

Arkia
Arkia

Reputation: 126

1) This is happening because you don't add '\0' to the end of the word before printing it out. After your program encounters the first space first_word looks something like this {'T', 'h', 'i', 's', '\0', '\0', ...} and is printed out just fine. Calling strcpy(first_word, "") changes this to {'\0', 'h', 'i', 's', '\0', ...} then reading in the next word "is" overwrites the first two characters of the string resulting in {'i', 's', 'i', 's', '\0', ...} and thus first_word is now the string "isis" as shown in the output. This can be fixed by simply adding first_word[index_count] = '\0' before printing the string.

2.1) The reason this array contains the same string in each index is because your string array arr_of_strings is an array of string pointers that ultimately all point to the same string first_word which will contain the last word of the sentence at the end of the loop. This can be solved a couple ways with one of them being to make arr_of_strings a two dimensional array like char arr_of_strings[MAX_STRINGS][MAX_LENGTH] and then you would add your strings into that array with strcpy(arr_of_strings[arr_index_count], first_word)

2.2) Finally the reason it only prints "sentence." three times is because your only checking for a space to signify the end of a word. "sentence." ends with the null terminator '\0' so it's never added to the array of words and the output also doesn't have a line "Word is: sentence."

Upvotes: 2

gsamaras
gsamaras

Reputation: 73366

arr_of_strings is just one array of char pointers, and then you point all words to array first_word. Moreover, you do not use a null-terminator, which is needed for a C string.


Here is an approach, that might help you, which uses strtok:

#include <string.h>
#include <stdio.h>

#define N 100
#define LEN 20 // max length of a word

int fill(char matrix[N][LEN], char* data)
{
    // How many words in 'data'?
    int counter = 0;
    char * pch;
    // Splits 'data' to tokens, separated by a whitespace
    pch = strtok (data," ");
    while (pch != NULL)
    {
        // Copy a word to the correct row of 'matrix'
        strcpy(matrix[counter++], pch);
        //printf ("%s\n",pch);
        pch = strtok (NULL, " ");
    }
    return counter;
}

void print(char matrix[N][LEN], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       printf("%s\n", matrix[i]);
}

int main(void)
{
    char data[] = "New to the C programming language";
    // We will store each word of 'data' to a matrix, of 'N' rows and 'LEN' columns
    char matrix[N][LEN] = {0};
    int words_no;
    // 'fill()' populates 'matrix' with 'data' and returns the number of words contained in 'data'.
    words_no = fill(matrix, data);
    print(matrix, words_no);
    return 0;
}

Output:

New
to
the
C
programming
language

Upvotes: 2

Related Questions