user1662433
user1662433

Reputation: 23

Parsing a complex string in C

I have a String like this:

"00:00:00 000~00:02:00 0000|~00:01:00 0000;00:01:00 0000~",

I want to get each of the items like "00:00:00 000".

My idea is that first, split the string by ";", then split by "|", and finally split by "~".

But the problem is that I can't get it if it's null, such like "00:01:00 0000~", the part after "~", I wanna get it and set a default value to it then store it somewhere else, but the code doesn't work. What is the problem?

Here is my code:

int main(int argc, char *argv[])
{

   char *str1, *str2, *str3, *str4, *token, *subtoken, *subt1, *subt2;
   char *saveptr1, *saveptr2, *saveptr3;
   int j;

   for (j = 1, str1 = argv[1]; ; j++, str1 = NULL) {
       token = strtok_r(str1, ";", &saveptr1);
       if (token == NULL)
           break;
       printf("%d: %s\n", j, token);

       int flag1 = 1; 
       for (str2 = token; ; str2 = NULL) {
           subtoken = strtok_r(str2, "|", &saveptr2);
           if (subtoken == NULL)
               break;
           printf("  %d: --> %s\n", flag1++, subtoken);
           int flag2 = 1;
           for(str3 = subtoken; ; str3 = NULL) {
                subt1 = strtok_r(str3, "~", &saveptr3);
                if(subt1 == NULL) {
                    break;
                }
                printf("      %d: --> %s\n",flag2++, subt1);
           }
       }
   }

   exit(EXIT_SUCCESS);
} /* main */

Upvotes: 2

Views: 435

Answers (3)

perh
perh

Reputation: 1708

It is indeed easier to just write a custom parser in this case.

The version below allocates new strings, If allocating new memory is not desired, change the add_string method to instead just point to start, and set start[len] to 0.

static int add_string( char **into, const char *start, int len )
{
    if( len<1 ) return 0;
    if( (*into = strndup( start, len )) )
        return 1;
    return 0;
}

static int is_delimeter( char x )
{
    static const char delimeters[] = { 0, '~', ',', '|',';' };
    int i;

    for( i=0; i<sizeof(delimeters); i++ )
        if( x == delimeters[i] )
            return 1;

    return 0;
}

static char **split( const char *data )
{
    char **res = malloc(sizeof(char *)*(strlen(data)/2+1));
    char **cur = res;
    int last_delimeter = 0, i;

    do {
        if( is_delimeter( data[i] ) )
        {
            if( add_string( cur, data+last_delimeter,i-last_delimeter) )
                cur++;
            last_delimeter = i+1;
        }
    } while( data[i++] );

    *cur = NULL;
    return res;
}

An example usage of the method:

int main()
{
    const char test[] = "00:00:00 000~00:02:00 0000|~00:01:00 0000;00:01:00 0000~";
    char **split_test = split( test );
    int i = 0;

    while( split_test[i] )
    {
        fprintf( stderr, "%2d: %s\n", i, split_test[i] );
        free( split_test[i] );
        i++;
    }
    free( split_test );
    return 0;
}

Upvotes: 2

Philip
Philip

Reputation: 5917

Instead of splitting the string, it might be more suitable to come up with a simple finite state machine that parses the string. Fortunately, your tokens seem to have an upper limit on their length, which makes things a lot easier:

Iterate over the string and distinguish four different states:

  • current character is not a delimiter, but previous character was (start of token)
  • current character is a delimiter and previous character wasn't (end of token)
  • current and previous character are both not delimiters (store them in temporary buffer)
  • current and previous character are both delimiters (ignore them, read next character)

It should be possible to come up with a very short (10 lines?) and concise piece of code that parses the string as specified.

Upvotes: 1

gammay
gammay

Reputation: 6215

You can simplify your algorithm if you first make all delimiters uniform. First replace all occurrences of , and | with ~, then the parsing will be easier. You can do this externally via sed or vim or programmatically in your C code. Then you should be able to get the 'NULL' problem easily. (Personally, I prefer not to use strtok as it modifies the original string).

Upvotes: 2

Related Questions