maximilliano
maximilliano

Reputation: 163

C string parser inside another parser

I have the following string :

GET /index.html HTTP/1.0;;User-Agent: Wget/1.11.4;;Accept: */*;;Host: www.google.com;;Connection

I use the following code to parse each element:

    while (parser != NULL){
        printf ("%s\n",parser);         
        parser = strtok (NULL, ";;");
    }

This outputs:

GET /index.html HTTP/1.0
User-Agent: Wget/1.11.4
Accept: */*
Host: www.google.com
Connection

Now I only need to get host web address which in this case is www.google.com. So first I want to separate it from other stuff.

To do that I put another parser inside my previous one like so:

    while (parser != NULL){
        char * pars = strtok (string,":");
        while (pars != NULL) {
            printf("%s\n", pars);
            pars = strtok (NULL, ":");
        }
        parser = strtok (NULL, ";;");
    }

The output of this is some messed up stuff. I do not understand why... Can anyone see mistake? Thanks

Upvotes: 1

Views: 89

Answers (2)

Floris
Floris

Reputation: 46365

There is a big problem with your approach - apart from the issue of strtok not being re-entrant. That is that strtok looks for a "match with any token" - so strtok(NULL, ";;") will stop at the first ;, not at the first ;;.

I would go about this a different way - you are looking for a specific string ("\nHost: ") - search for that, then find the bit that follows. This seems like a more robust solution.

Also note that strtok modifies its argument - basically it will add '\0' where it finds the token, so you will not be able to re-use the string after it was manipulated by strtok. If you want to use the string afterwards, you need to make a copy first.

All of which suggests that you want to re-think your parsing strategy. How about

char *inputString = "GET /index.html HTTP/1.0;;User-Agent: Wget/1.11.4;;Accept: /;;Host: www.google.com;;Connection"; char *temp, *hostString, *endHost; temp = strstr(inputString, ";;Host:") + 7; // point right after "Host:" endHost = strstr(temp, ";;"); nChar = (int)(endHost - temp) + 1; hostString = malloc(nChar); strcpy(hostString, temp, nChar);

This is just to find / extract the host string.

Upvotes: 0

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726539

The reason your code does not work is that strtok is non-reentrant. Because the function uses static variables to save the state (this is what lets you call strtok with NULL as the first parameter) you cannot set up calls of strtok in nested loops: once you tell strtok to parse with ":" delimiter, it "forgets" the state of parsing with the ";" delimiter.

Switching to re-entrant version of strtok - strtok_r, will fix this problem. This function requires you to supply an extra parameter, savePtr. Important: you need to supply two different variables for your savePtr for strtok_r in the inner and the outer loops, otherwise the code would exhibit the same behavior.

Note: strtok_r is not part of C standard. However, most popular C libraries make it available. In case your library does not have strtok_r, locate source code for it, and add it to your own code base.

Upvotes: 4

Related Questions