ryyst
ryyst

Reputation: 9781

C string arrays

I just wrote a string splitting function:

typedef enum {
    strspl_allocation_error = 1
} strspl_error;


int strspl(const char *string, const char separator, char ***result) {

    const int stringLength = strlen(string);
    int lastSplit = 0;
    int numberOfComponents = 1;

    // Compute the number of components
    for (int i = 0; i <= stringLength; i++) {
        if (string[i] == separator)
            numberOfComponents++;
    }

    // Allocate space to hold pointers to each component
    *result = (char **) malloc(numberOfComponents * sizeof(char *));
    if (result == NULL)
        return strspl_allocation_error;

    numberOfComponents = 0;

    for (int i = 0; i <= stringLength; i++) {
        char c = string[i];
        if (c == separator || i == stringLength) {

            const int componentLength = i - lastSplit;

            // Allocate space to hold the component
            char *component = (char *) malloc(componentLength * sizeof(char));
            if (component == NULL)
                return strspl_allocation_error;

            // Copy the characters from the string into the component
            for (int j = 0; j < componentLength; j++)
                component[j] = string[lastSplit + j];
            component[componentLength] = '\0';

            // Put the component into the array
            *result[numberOfComponents] = component;

            lastSplit = i + 1;
            numberOfComponents++;
        }
    }

    return numberOfComponents;
}

Example:

char **result;
int r = strspl("aaaBcBddddeeBk", 'B', result);

for (int i = 0; i < r; i++)
    printf("component: %s\n", result[i]);

should output:

component: aaa
component: c
component: ddddee
component: k

But when I run it, it either crashes or returns garbage values. I don't understand where I'm making a mistake...

Update: here's a hopefully bugfree version:

int strspl(const char *string, const char separator, char ***results) {

    const char *separatorString = (char[]){separator, '\0'};
    int numberOfComponents = 0;
    int stringLength = strlen(string);

    int lastCharacterWasSeparator = 1;

    // Compute the number of components
    for (int i = 0; i < stringLength; i++) {
        if (string[i] != separator) {
            if (lastCharacterWasSeparator)
                numberOfComponents++;
            lastCharacterWasSeparator = 0;
        }
        else
            lastCharacterWasSeparator = 1;
    }

    // Allocate space to hold pointers to components
    *results = malloc(numberOfComponents * sizeof(**results));

    char *stringCopy = strdup(string); // A reference to the copy of the string to modify it and to free() it later.
    char *strptr = stringCopy; // This will be used to iterate through the string.
    int componentLength = 0;
    int component = 0;

    while (component < numberOfComponents) {

        // Move to the startpoint of the next component.
        while (componentLength == 0) {
            componentLength = strcspn(strptr, separatorString);

            // Break out the while loop if we found an actual component.
            if (componentLength != 0)
                break;

            // If we found two adjacent separators, we just "silently" move over them.
            strptr += componentLength + 1;
        }

        // Replace the terminating separator character with a NULL character.
        strptr[componentLength] = '\0';

        // Copy the new component into the array.
        (*results)[component++] = strdup(strptr);

        // Move the string pointer ahead so we can work on the next component.
        strptr += componentLength + 1;

        componentLength = 0;
    }

    // Free the copy of the string.
    free(stringCopy);

    return numberOfComponents;
}

Upvotes: 3

Views: 703

Answers (3)

SiegeX
SiegeX

Reputation: 140227

  1. You never assign the component strings into the result array
  2. You have an off-by-one error when you malloc component because you don't account for the NUL
  3. You need one more layer of indirection so you can alter the char **result when you pass it to your function
  4. Don't cast malloc()'s return value in C. See here as to why

See THIS LINK for a working implementation.

Update

Here is how I would implement strspl()

int strspl(const char *string, const int separator, char ***result)
{
    int len, n=1, i=0;
    char *strptr = *copy = strdup(string);

    while(*strptr++) {
        if (*strptr == separator){ n++; }
    }

    *result = malloc(n * sizeof(**result));

    //Reset Pointer to Beginning of string
    strptr = copy;

    while(len = strcspn(strptr, (char []){separator, '\0'})) {
        strptr[len] = '\0';
        (*result)[i++] = strdup(strptr);
        strptr += len + 1;
    }

    free(copy);
    return n;
}

Output

component: aaa
component: c
component: ddddee
component: k

Upvotes: 1

Miquella
Miquella

Reputation: 1084

Sorry about that, all of what we suggested for you to do to fix it combined together and broke it again! From your original code, here are the adjustments you need:

  1. The signature of the function needs to be char ***result instead of char **result.
  2. The array allocation should be *result = malloc(...) instead of result = malloc(...).
  3. The component pointer was never being stored in the results array, a line reading: (*result)[numberOfComponents] = component; should be put after component[componentLength] = '\0'; (the parenthesis are needed because the result parameter was changed to char***).
  4. And finally, the call to the function should be something like this: strspl(..., &result); instead of strspl(..., result);

Pointers have always been one of the more difficult things to understand when working in C/C++... Let me see if I can explain this:

Let's say we have the caller with a stack like this:

Address     -  Data        -  Description
0x99887760  -  0xbaadbeef  -  caller-result variable (uninitialized garbage)

When the call is made like this: strspl(..., result);, the compiler copies the local pointer (0xbaadbeef) onto the stack of strspl:

Address     -  Data        -  Description
0x99887750  -  0xbaadbeef  -  strspl-result variable (copy of uninitialized garbage)
...
0x99887760  -  0xbaadbeef  -  caller-result variable (uninitialized garbage)

Now when we call result = malloc(...) and copy the result into the local strspl-result variable, we get:

Address     -  Data        -  Description
0x99887750  -  0x01000100  -  strspl-result variable (new array)
...
0x99887760  -  0xbaadbeef  -  caller-result variable (uninitialized garbage)

Obviously not updating the caller's result variable.


If instead we call with the address of the result variable: strspl(..., &result); we get this:

Address     -  Data        -  Description
0x99887750  -  0x99887760  -  strspl-result variable (pointer to the caller's result)
...
0x99887760  -  0xbaadbeef  -  caller-result variable (uninitialized garbage)

And then when we call result = malloc(...) we get this:

Address     -  Data        -  Description
0x99887750  -  0x01000100  -  strspl-result variable (new array)
...
0x99887760  -  0xbaadbeef  -  caller-result variable (uninitialized garbage)

Still not quite what we want, because the caller never gets the pointer to the array.


If instead we call *result = malloc(...) we get this:

Address     -  Data        -  Description
0x99887750  -  0x99887760  -  strspl-result variable (pointer to the caller's result)
...
0x99887760  -  0x01000100  -  caller-result variable (new array)

This way, when we return, we've overwritten the caller's garbage with our newly malloc'd array.


As you can see, the compiler is copying the address of the caller's variable onto the stack of the function being called. Because it's copied, the function cannot modify it unless the caller passes a pointer to it's variable (this is why it needed to be char*** not char**).

I hope that clears things up without making it harder to understand! :-P

Upvotes: 3

Elalfer
Elalfer

Reputation: 5338

  1. First of all it is more correct to use for(int i = 0; i < stringLength; i++), but it shouldn't be a big mistake in your case.
  2. If you want to return an array of char * you should pass char *** result and assign *result = (char**) malloc(...).
  3. I do not see where you saving component pointer into your result array
  4. I think it is better to use strdup or strcpy to copy strings

Upvotes: 1

Related Questions