Reputation: 2655
What would be an efficient way of converting a delimited string into an array of strings in C (not C++)? For example, I might have:
char *input = "valgrind --leak-check=yes --track-origins=yes ./a.out"
The source string will always have only a single space as the delimiter. And I would like a malloc'ed array of malloc'ed strings char *myarray[]
such that:
myarray[0]=="valgrind"
myarray[1]=="--leak-check=yes"
...
Edit I have to assume that there are an arbitrary number of tokens in the inputString
so I can't just limit it to 10 or something.
I've attempted a messy solution with strtok
and a linked list I've implemented, but valgrind complained so much that I gave up.
(If you're wondering, this is for a basic Unix shell I'm trying to write.)
Upvotes: 5
Views: 6751
Reputation: 34592
Looking at the other answers, for a beginner in C, it would look complex due to the tight size of code, I thought I would put this in for a beginner, it might be easier to actually parse the string instead of using strtok
...something like this:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> char **parseInput(const char *str, int *nLen); void resizeptr(char ***, int nLen); int main(int argc, char **argv){ int maxLen = 0; int i = 0; char **ptr = NULL; char *str = "valgrind --leak-check=yes --track-origins=yes ./a.out"; ptr = parseInput(str, &maxLen); if (!ptr) printf("Error!\n"); else{ for (i = 0; i < maxLen; i++) printf("%s\n", ptr[i]); } for (i = 0; i < maxLen; i++) free(ptr[i]); free(ptr); return 0; } char **parseInput(const char *str, int *Index){ char **pStr = NULL; char *ptr = (char *)str; int charPos = 0, indx = 0; while (ptr++ && *ptr){ if (!isspace(*ptr) && *ptr) charPos++; else{ resizeptr(&ptr, ++indx); pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1); if (!pStr[indx-1]) return NULL; strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1); pStr[indx-1][charPos+1]='\0'; charPos = 0; } } if (charPos > 0){ resizeptr(&pStr, ++indx); pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1); if (!pStr[indx-1]) return NULL; strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1); pStr[indx-1][charPos+1]='\0'; } *Index = indx; return (char **)pStr; } void resizeptr(char ***ptr, int nLen){ if (*(ptr) == (char **)NULL){ *(ptr) = (char **)malloc(nLen * sizeof(char*)); if (!*(ptr)) perror("error!"); }else{ char **tmp = (char **)realloc(*(ptr),nLen); if (!tmp) perror("error!"); *(ptr) = tmp; } }
I slightly modified the code to make it easier. The only string function that I used was strncpy
..sure it is a bit long-winded but it does reallocate the array of strings dynamically instead of using a hard-coded MAX_ARGS, which means that the double pointer is already hogging up memory when only 3 or 4 would do, also which would make the memory usage efficient and tiny, by using realloc
, the simple parsing is covered by employing isspace
, as it iterates using the pointer. When a space is encountered, it realloc
ates the double pointer, and malloc
the offset to hold the string.
Notice how the triple pointers are used in the resizeptr
function.. in fact, I thought this would serve an excellent example of a simple C program, pointers, realloc, malloc, passing-by-reference, basic element of parsing a string...
Hope this helps, Best regards, Tom.
Upvotes: 0
Reputation: 34148
if you have all of the input in input
to begin with then you can never have more tokens than strlen(input)
. If you don't allow "" as a token, then you can never have more than strlen(input)/2
tokens. So unless input
is huge you can safely write.
char ** myarray = malloc( (strlen(input)/2) * sizeof(char*) );
int NumActualTokens = 0;
while (char * pToken = get_token_copy(input))
{
myarray[++NumActualTokens] = pToken;
input = skip_token(input);
}
char ** myarray = (char**) realloc(myarray, NumActualTokens * sizeof(char*));
As a further optimization, you can keep input
around and just replace spaces with \0 and put pointers into the input
buffer into myarray[]. No need for a separate malloc for each token unless for some reason you need to free them individually.
Upvotes: 2
Reputation: 20686
From the strsep(3)
manpage on OSX:
char **ap, *argv[10], *inputstring;
for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
if (**ap != '\0')
if (++ap >= &argv[10])
break;
Edited for arbitrary # of tokens:
char **ap, **argv, *inputstring;
int arglen = 10;
argv = calloc(arglen, sizeof(char*));
for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
if (**ap != '\0')
if (++ap >= &argv[arglen])
{
arglen += 10;
argv = realloc(argv, arglen);
ap = &argv[arglen-10];
}
Or something close to that. The above may not work, but if not it's not far off. Building a linked list would be more efficient than continually calling realloc
, but that's really besides the point - the point is how best to make use of strsep
.
Upvotes: 1
Reputation: 133587
What's about something like:
char* string = "valgrind --leak-check=yes --track-origins=yes ./a.out";
char** args = (char**)malloc(MAX_ARGS*sizeof(char*));
memset(args, 0, sizeof(char*)*MAX_ARGS);
char* curToken = strtok(string, " \t");
for (int i = 0; curToken != NULL; ++i)
{
args[i] = strdup(curToken);
curToken = strtok(NULL, " \t");
}
Upvotes: 2
Reputation: 1647
Were you remembering to malloc an extra byte for the terminating null that marks the end of string?
Upvotes: 1