Reputation: 23
I have a String like this:
"00:00:00 000~00:02:00 0000|~00:01:00 0000;00:01:00 0000~"
,
I want to get each of the items like "00:00:00 000"
.
My idea is that first, split the string by ";"
, then split by "|"
, and finally split by "~"
.
But the problem is that I can't get it if it's null, such like "00:01:00 0000~"
, the part after "~"
, I wanna get it and set a default value to it then store it somewhere else, but the code doesn't work. What is the problem?
Here is my code:
int main(int argc, char *argv[])
{
char *str1, *str2, *str3, *str4, *token, *subtoken, *subt1, *subt2;
char *saveptr1, *saveptr2, *saveptr3;
int j;
for (j = 1, str1 = argv[1]; ; j++, str1 = NULL) {
token = strtok_r(str1, ";", &saveptr1);
if (token == NULL)
break;
printf("%d: %s\n", j, token);
int flag1 = 1;
for (str2 = token; ; str2 = NULL) {
subtoken = strtok_r(str2, "|", &saveptr2);
if (subtoken == NULL)
break;
printf(" %d: --> %s\n", flag1++, subtoken);
int flag2 = 1;
for(str3 = subtoken; ; str3 = NULL) {
subt1 = strtok_r(str3, "~", &saveptr3);
if(subt1 == NULL) {
break;
}
printf(" %d: --> %s\n",flag2++, subt1);
}
}
}
exit(EXIT_SUCCESS);
} /* main */
Upvotes: 2
Views: 435
Reputation: 1708
It is indeed easier to just write a custom parser in this case.
The version below allocates new strings, If allocating new memory is not desired, change the add_string
method to instead just point to start
, and set start[len]
to 0.
static int add_string( char **into, const char *start, int len )
{
if( len<1 ) return 0;
if( (*into = strndup( start, len )) )
return 1;
return 0;
}
static int is_delimeter( char x )
{
static const char delimeters[] = { 0, '~', ',', '|',';' };
int i;
for( i=0; i<sizeof(delimeters); i++ )
if( x == delimeters[i] )
return 1;
return 0;
}
static char **split( const char *data )
{
char **res = malloc(sizeof(char *)*(strlen(data)/2+1));
char **cur = res;
int last_delimeter = 0, i;
do {
if( is_delimeter( data[i] ) )
{
if( add_string( cur, data+last_delimeter,i-last_delimeter) )
cur++;
last_delimeter = i+1;
}
} while( data[i++] );
*cur = NULL;
return res;
}
An example usage of the method:
int main()
{
const char test[] = "00:00:00 000~00:02:00 0000|~00:01:00 0000;00:01:00 0000~";
char **split_test = split( test );
int i = 0;
while( split_test[i] )
{
fprintf( stderr, "%2d: %s\n", i, split_test[i] );
free( split_test[i] );
i++;
}
free( split_test );
return 0;
}
Upvotes: 2
Reputation: 5917
Instead of splitting the string, it might be more suitable to come up with a simple finite state machine that parses the string. Fortunately, your tokens seem to have an upper limit on their length, which makes things a lot easier:
Iterate over the string and distinguish four different states:
It should be possible to come up with a very short (10 lines?) and concise piece of code that parses the string as specified.
Upvotes: 1
Reputation: 6215
You can simplify your algorithm if you first make all delimiters uniform. First replace all occurrences of , and | with ~, then the parsing will be easier. You can do this externally via sed or vim or programmatically in your C code. Then you should be able to get the 'NULL' problem easily. (Personally, I prefer not to use strtok as it modifies the original string).
Upvotes: 2