Alsakka
Alsakka

Reputation: 185

Splitting string into words in array without using any pre-made functions in C

I am trying to create a function that takes a string, splits it into words and return an array with the words in it. I am not allowed to use any pre-made functions other than malloc within the splitting function. Finally I have to set my function in this form char **ft_split_whitespaces(char *str) My current output looks like that:


    d this is me
    s is me
    s me
    r

Expected output:


    Hello
    World
    This
    Is
    Me

my full code is in the following codes:


    #include <stdio.h>
    #include <stdlib.h>
    
    int     count_words(char *str)
    {
        int i; 
        int word;
        
        i = 0;
        word = 1;
        while(str[i]!='\0')
        {
            if(str[i]==' ' || str[i]=='\n' || str[i]=='\t' 
            || str[i]=='\f' || str[i]=='\r' || str[i]=='\v')
                word++;
            i++;
        }
        return (word);
    }
    
    char    **ft_split_whitespaces(char *str)
    {
        int index;
        int size;
        int index2;
        char **arr;
        
        index = 0;
        index2 = 0;
        size = count_words(str);
        arr = (char **)malloc(size * sizeof(char));
        if (arr == NULL)
            return ((char **)NULL);
        while (str[index])
        {
            if(str[index] == ' ')
            {
                index++;
                value++;
                index2++;
            }
            else
                *(arr+index2) = (char*) malloc(index * sizeof(char));
                *(arr+index2) = &str[index];    
            index++;
        }
        **arr = '\0';
        return (arr);
    }
    
    int main()
    {
        char a[] = "Hello World This Is Me";
        char **arr;
        int i;
        int ctr = count_words(a);
        arr = ft_split_whitespaces(a);
        
        for(i=0;i < ctr;i++)
            printf("%s\n",arr[i]);
        return 0;
    }

Upvotes: 3

Views: 897

Answers (1)

Zoso
Zoso

Reputation: 3465

You have quite a few errors in your program:

  1. arr = (char **)malloc(size * sizeof(char)); is not right since arr is of type char**. You should use sizeof(char*) or better (sizeof(*arr)) since sizeof(char) is usually not equal to sizeof(char*) for modern systems.

  2. You don't have braces {} around your else statement in ft_split_whitespaces which you probably intended. So your conditional logic breaks.

  3. You are allocating a new char[] for every non--whitespace character in the while loop. You should only allocate one for every new word and then just fill in the characters in that array.

  4. *(arr+index2) = &str[index];This doesn't do what you think it does. It just points the string at *(arr+index2) to str offset by index. You either need to copy each character individually or do a memcpy() (which you probably can't use in the question). This explains why your answer just provides offsets into the main string and not the actual tokens.

  5. **arr = '\0'; You will lose whatever you store in the 0th index of arr. You need to individually append a \0 to each string in arr.

  6. *(arr+index2) = (char*) malloc(index * sizeof(char)); You will end up allocating progressively increasing size of char arrays at because you are using index for the count of characters, which keeps on increasing. You need to figure out the correct length of each token in the string and allocate appropriately.

Also why *(arr + index2)? Why not use the much easier to read arr[index2]?


Further clarifications:

Consider str = "abc de"

You'll start with

*(arr + 0) = (char*) malloc(0 * sizeof(char));
//ptr from malloc(0) shouldn't be dereferenced and is mostly pointless (no pun), probably NULL
*(arr + 0) = &str[0]; 

Here str[0] = 'a' and is a location somehwhere in memory, so on doing &str[0], you'll store that address in *(arr + 0)

Now in the next iteration, you'll have

*(arr + 0) = (char*) malloc(1 * sizeof(char)); 
*(arr + 0) = &str[1]; 

This time you replace the earlier malloc'd array at the same index2 again with a different address. In the next iterations *(arr + 0) = (char*) malloc(2 * sizeof(char));. You end up resetting the same *(arr + index2) position till you encounter a whitespace after which you do the same thing again for the next word. So don't allocate arrays for every index value but only if and when required. Also, this shows that you'll keep on increasing the size passed to malloc with the increasing value of index which is what #6 indicated.

Coming to &str[index].

You are setting (arr + index2) i.e. a char* (pointer to char) to another char*. In C, setting a pointer to another pointer doesn't copy the contents of the second pointer to the first, but only makes both of them point to the same memory location. So when you set something like *(arr + 1) = &str[4], it's just a pointer into the original string at index = 4. If you try to print this *(arr + 1) you'll just get a substring from index = 4 to the end of the string, not the word you're trying to obtain.

**arr = '\0' is just dereferencing the pointer at *arr and setting its value to \0. So imagine if you had *(arr + 0) = "hello\0", you'll set it to "\0ello\0". If you're ever iterating over this string, you'll never end up traversing beyond the first '\0' character. Hence you lose whatever *arr was earlier pointing to.

Also, *(arr + i) and arr[i] are exactly equivalent and make for much better readability. It better conveys that arr is an array and arr[i] is dereferencing the ith element.

Upvotes: 3

Related Questions