Reputation: 9781
I just wrote a string splitting function:
typedef enum {
strspl_allocation_error = 1
} strspl_error;
int strspl(const char *string, const char separator, char ***result) {
const int stringLength = strlen(string);
int lastSplit = 0;
int numberOfComponents = 1;
// Compute the number of components
for (int i = 0; i <= stringLength; i++) {
if (string[i] == separator)
numberOfComponents++;
}
// Allocate space to hold pointers to each component
*result = (char **) malloc(numberOfComponents * sizeof(char *));
if (result == NULL)
return strspl_allocation_error;
numberOfComponents = 0;
for (int i = 0; i <= stringLength; i++) {
char c = string[i];
if (c == separator || i == stringLength) {
const int componentLength = i - lastSplit;
// Allocate space to hold the component
char *component = (char *) malloc(componentLength * sizeof(char));
if (component == NULL)
return strspl_allocation_error;
// Copy the characters from the string into the component
for (int j = 0; j < componentLength; j++)
component[j] = string[lastSplit + j];
component[componentLength] = '\0';
// Put the component into the array
*result[numberOfComponents] = component;
lastSplit = i + 1;
numberOfComponents++;
}
}
return numberOfComponents;
}
Example:
char **result;
int r = strspl("aaaBcBddddeeBk", 'B', result);
for (int i = 0; i < r; i++)
printf("component: %s\n", result[i]);
should output:
component: aaa
component: c
component: ddddee
component: k
But when I run it, it either crashes or returns garbage values. I don't understand where I'm making a mistake...
Update: here's a hopefully bugfree version:
int strspl(const char *string, const char separator, char ***results) {
const char *separatorString = (char[]){separator, '\0'};
int numberOfComponents = 0;
int stringLength = strlen(string);
int lastCharacterWasSeparator = 1;
// Compute the number of components
for (int i = 0; i < stringLength; i++) {
if (string[i] != separator) {
if (lastCharacterWasSeparator)
numberOfComponents++;
lastCharacterWasSeparator = 0;
}
else
lastCharacterWasSeparator = 1;
}
// Allocate space to hold pointers to components
*results = malloc(numberOfComponents * sizeof(**results));
char *stringCopy = strdup(string); // A reference to the copy of the string to modify it and to free() it later.
char *strptr = stringCopy; // This will be used to iterate through the string.
int componentLength = 0;
int component = 0;
while (component < numberOfComponents) {
// Move to the startpoint of the next component.
while (componentLength == 0) {
componentLength = strcspn(strptr, separatorString);
// Break out the while loop if we found an actual component.
if (componentLength != 0)
break;
// If we found two adjacent separators, we just "silently" move over them.
strptr += componentLength + 1;
}
// Replace the terminating separator character with a NULL character.
strptr[componentLength] = '\0';
// Copy the new component into the array.
(*results)[component++] = strdup(strptr);
// Move the string pointer ahead so we can work on the next component.
strptr += componentLength + 1;
componentLength = 0;
}
// Free the copy of the string.
free(stringCopy);
return numberOfComponents;
}
Upvotes: 3
Views: 703
Reputation: 140227
result
arraymalloc
component because you don't account for the NULchar **result
when you pass it to your functionmalloc()
's return value in C. See here as to whySee THIS LINK for a working implementation.
Here is how I would implement strspl()
int strspl(const char *string, const int separator, char ***result)
{
int len, n=1, i=0;
char *strptr = *copy = strdup(string);
while(*strptr++) {
if (*strptr == separator){ n++; }
}
*result = malloc(n * sizeof(**result));
//Reset Pointer to Beginning of string
strptr = copy;
while(len = strcspn(strptr, (char []){separator, '\0'})) {
strptr[len] = '\0';
(*result)[i++] = strdup(strptr);
strptr += len + 1;
}
free(copy);
return n;
}
component: aaa
component: c
component: ddddee
component: k
Upvotes: 1
Reputation: 1084
Sorry about that, all of what we suggested for you to do to fix it combined together and broke it again! From your original code, here are the adjustments you need:
char ***result
instead of char **result
.*result = malloc(...)
instead of result = malloc(...)
.(*result)[numberOfComponents] = component;
should be put after component[componentLength] = '\0';
(the parenthesis are needed because the result parameter was changed to char***
).strspl(..., &result);
instead of strspl(..., result);
Pointers have always been one of the more difficult things to understand when working in C/C++... Let me see if I can explain this:
Let's say we have the caller with a stack like this:
Address - Data - Description
0x99887760 - 0xbaadbeef - caller-result variable (uninitialized garbage)
When the call is made like this: strspl(..., result);
, the compiler copies the local pointer (0xbaadbeef
) onto the stack of strspl
:
Address - Data - Description
0x99887750 - 0xbaadbeef - strspl-result variable (copy of uninitialized garbage)
...
0x99887760 - 0xbaadbeef - caller-result variable (uninitialized garbage)
Now when we call result = malloc(...)
and copy the result into the local strspl-result variable, we get:
Address - Data - Description
0x99887750 - 0x01000100 - strspl-result variable (new array)
...
0x99887760 - 0xbaadbeef - caller-result variable (uninitialized garbage)
Obviously not updating the caller's result variable.
If instead we call with the address of the result variable: strspl(..., &result);
we get this:
Address - Data - Description
0x99887750 - 0x99887760 - strspl-result variable (pointer to the caller's result)
...
0x99887760 - 0xbaadbeef - caller-result variable (uninitialized garbage)
And then when we call result = malloc(...)
we get this:
Address - Data - Description
0x99887750 - 0x01000100 - strspl-result variable (new array)
...
0x99887760 - 0xbaadbeef - caller-result variable (uninitialized garbage)
Still not quite what we want, because the caller never gets the pointer to the array.
If instead we call *result = malloc(...)
we get this:
Address - Data - Description
0x99887750 - 0x99887760 - strspl-result variable (pointer to the caller's result)
...
0x99887760 - 0x01000100 - caller-result variable (new array)
This way, when we return, we've overwritten the caller's garbage with our newly malloc
'd array.
As you can see, the compiler is copying the address of the caller's variable onto the stack of the function being called. Because it's copied, the function cannot modify it unless the caller passes a pointer to it's variable (this is why it needed to be char***
not char**
).
I hope that clears things up without making it harder to understand! :-P
Upvotes: 3
Reputation: 5338
for(int i = 0; i < stringLength; i++)
, but it shouldn't be a big mistake in your case.char *
you should pass char *** result
and assign
*result = (char**) malloc(...)
.component
pointer into your result
arraystrdup
or strcpy
to copy stringsUpvotes: 1