Justin D.
Justin D.

Reputation: 4976

C Way to Extract Variables from Strings

I was wondering how do C programmers usually extract data from a string? I read a lot about strtok, but I personally dislike the way the function works. Having to call it again with NULL as parameter seems odd to me. I once stumbled upon this little piece of code which I find pretty sleek :

sscanf(data, "%*[^=]%*c%[^&]%*[^=]%*c%[^&]", usr, pw);

This would extract data from a URL query string (only var1=value&var2=value).

Is there a reason to use strtok over sscanf? Performance maybe?

Upvotes: 1

Views: 1489

Answers (5)

randomusername
randomusername

Reputation: 8097

sscanf uses a very incomplete (though efficient to implement) regular expression syntax, so if you wanted to do something more complicated, you cannot use sscanf.

That being said, strtok isn't re entrant so if you're using threading then you're out of luck.

But generally speaking, the one that ends up running faster for a particular circumstance and is more elegant is often considered to be the most idiomatic for that circumstance.

Upvotes: 1

Tim Pierce
Tim Pierce

Reputation: 5664

They are each better or more convenient at certain kinds of tasks:

  • sscanf allows you to concisely specify a fairly complex template for parsing values out of a line of text, but it is very unforgiving. If your input text differs by even a character from your template, the scan will fail. For that reason, it's almost never the right tool to use for human-generated input, for example. It is most useful for scanning automatically generated output, e.g. server log lines.

  • strtok is much more flexible, but also much more verbose: parsing a line with only a few fields may take many lines of code. It is also destructive: it actually modifies the string that is passed to it, so you may need to make a copy of the data before invoking strtok.

Upvotes: 1

egur
egur

Reputation: 7970

strtok is a much simpler, low level function mostly used to tokenize strings that have an unknown element count.

NULL is used to tell strtok to continue scanning the string from the last position, saving you some pointer manipulation and probably (internally to strtok) some initialization.

There's also the matter of readability. looking at the code snippet, it takes some time to understand what's going on.

Upvotes: 1

Samuel Allan
Samuel Allan

Reputation: 392

I myself created a small header file with a few definitions of functions that can help such as a char **Split(src, sep) function and a int DoubleArrLen(char **arr), If you can improve it in any way here is the small 1-hour work thing.

#include <string.h>
#include <stdlib.h>
#include <malloc.h>
#include <assert.h>
char *substring(char *string, int position, int length) 
{
   char *pointer;
   int c;

   pointer = malloc(length+1);

   if (pointer == NULL)
   {
      printf("Unable to allocate memory.\n");
      exit(EXIT_FAILURE);
   }

   for (c = 0 ; c < position -1 ; c++) 
      string++; 

   for (c = 0 ; c < length ; c++)
   {
      *(pointer+c) = *string;      
      string++;   
   }

   *(pointer+c) = '\0';

   return pointer;
}

char **Split(char *a_str, const char a_delim)
{
    char **result    = 0;
    size_t count     = 0;
    char *tmp        = a_str;
    char *last_comma = 0;

    /* Count how many elements will be extracted. */
    while (*tmp)
    {
        if (a_delim == *tmp)
        {
            count++;
            last_comma = tmp;
        }
        tmp++;
    }
    /* Add space for trailing token. */
    count += last_comma < (a_str + strlen(a_str) - 1);

    /* Add space for terminating null string so caller
       knows where the list of returned strings ends. */
    count++;
    result = malloc(sizeof(char *) * count);

    if (result)
    {
        char delim[2] = { a_delim, '\0' };  // Fix for inconsistent splitting
        size_t idx  = 0;
        char *token = strtok(a_str, delim);

        while (token)
        {
            assert(idx < count);
            *(result + idx++) = strdup(token);
            token = strtok(0, delim);
        }
        assert(idx == count - 1);
        *(result + idx) = 0;
    }
    return result;
}
static int SplitLen(char **array)
{
    int i = 0;
    while (*array++ != 0)
        i++;
    return i;
}
int IndexOf(char *str, char *ch)
{
    int i;
    int cnt;
    int result = -1;
    if(strlen(str) >= strlen(ch))
    {
        for(i = 0; i<strlen(str); i++)
        {
            if(str[i] == ch[0])
            {
                result = i;
                for(cnt = 1; cnt < strlen(ch); cnt++)
                {
                    if(str[i + cnt] != ch[cnt]) result = -1; break;
                }
            }
        }
    }
    return result;
}
int IndexOfChar(char *str, char ch)
{
    int result = -1;
    int i = 0;  
    for(;i<strlen(str); i++)
    {
        if(str[i] == ch)
        {
            result = i; 
            break;
        }
    }
    return result;
}

A little explanation can be the functions: the substring function extracts a part of a string. the IndexOf() function searches for a string inside the source string. Others should be self-explanatory. This includes a Split function as I pointed out earlier, you can use that instead of strtok..

Upvotes: 0

Jens
Jens

Reputation: 72667

IMHO the best way is the most readable and understandable way. sscanf and strtok totally disqualify with your user/pw extraction from an URL.

Instead, look for the boundaries of the strings you are looking for (in an URL the slash, the at-sign, the colon, what have you) with strchr and strrchr, then memcpy from start to end to where you need the data and tack on a NUL. This also allows for appropriate error handling should the string have an unexpected format.

Upvotes: 2

Related Questions