Reputation: 4976
I was wondering how do C programmers usually extract data from a string? I read a lot about strtok
, but I personally dislike the way the function works. Having to call it again with NULL
as parameter seems odd to me. I once stumbled upon this little piece of code which I find pretty sleek :
sscanf(data, "%*[^=]%*c%[^&]%*[^=]%*c%[^&]", usr, pw);
This would extract data from a URL query string (only var1=value&var2=value
).
Is there a reason to use strtok
over sscanf
? Performance maybe?
Upvotes: 1
Views: 1489
Reputation: 8097
sscanf
uses a very incomplete (though efficient to implement) regular expression syntax, so if you wanted to do something more complicated, you cannot use sscanf
.
That being said, strtok
isn't re entrant so if you're using threading then you're out of luck.
But generally speaking, the one that ends up running faster for a particular circumstance and is more elegant is often considered to be the most idiomatic for that circumstance.
Upvotes: 1
Reputation: 5664
They are each better or more convenient at certain kinds of tasks:
sscanf
allows you to concisely specify a fairly complex template for parsing values out of a line of text, but it is very unforgiving. If your input text differs by even a character from your template, the scan will fail. For that reason, it's almost never the right tool to use for human-generated input, for example. It is most useful for scanning automatically generated output, e.g. server log lines.
strtok
is much more flexible, but also much more verbose: parsing a line with only a few fields may take many lines of code. It is also destructive: it actually modifies the string that is passed to it, so you may need to make a copy of the data before invoking strtok
.
Upvotes: 1
Reputation: 7970
strtok
is a much simpler, low level function mostly used to tokenize strings that have an unknown element count.
NULL
is used to tell strtok to continue scanning the string from the last position, saving you some pointer manipulation and probably (internally to strtok) some initialization.
There's also the matter of readability. looking at the code snippet, it takes some time to understand what's going on.
Upvotes: 1
Reputation: 392
I myself created a small header file with a few definitions of functions that can help such as a char **Split(src, sep)
function and a int DoubleArrLen(char **arr)
,
If you can improve it in any way here is the small 1-hour work thing.
#include <string.h>
#include <stdlib.h>
#include <malloc.h>
#include <assert.h>
char *substring(char *string, int position, int length)
{
char *pointer;
int c;
pointer = malloc(length+1);
if (pointer == NULL)
{
printf("Unable to allocate memory.\n");
exit(EXIT_FAILURE);
}
for (c = 0 ; c < position -1 ; c++)
string++;
for (c = 0 ; c < length ; c++)
{
*(pointer+c) = *string;
string++;
}
*(pointer+c) = '\0';
return pointer;
}
char **Split(char *a_str, const char a_delim)
{
char **result = 0;
size_t count = 0;
char *tmp = a_str;
char *last_comma = 0;
/* Count how many elements will be extracted. */
while (*tmp)
{
if (a_delim == *tmp)
{
count++;
last_comma = tmp;
}
tmp++;
}
/* Add space for trailing token. */
count += last_comma < (a_str + strlen(a_str) - 1);
/* Add space for terminating null string so caller
knows where the list of returned strings ends. */
count++;
result = malloc(sizeof(char *) * count);
if (result)
{
char delim[2] = { a_delim, '\0' }; // Fix for inconsistent splitting
size_t idx = 0;
char *token = strtok(a_str, delim);
while (token)
{
assert(idx < count);
*(result + idx++) = strdup(token);
token = strtok(0, delim);
}
assert(idx == count - 1);
*(result + idx) = 0;
}
return result;
}
static int SplitLen(char **array)
{
int i = 0;
while (*array++ != 0)
i++;
return i;
}
int IndexOf(char *str, char *ch)
{
int i;
int cnt;
int result = -1;
if(strlen(str) >= strlen(ch))
{
for(i = 0; i<strlen(str); i++)
{
if(str[i] == ch[0])
{
result = i;
for(cnt = 1; cnt < strlen(ch); cnt++)
{
if(str[i + cnt] != ch[cnt]) result = -1; break;
}
}
}
}
return result;
}
int IndexOfChar(char *str, char ch)
{
int result = -1;
int i = 0;
for(;i<strlen(str); i++)
{
if(str[i] == ch)
{
result = i;
break;
}
}
return result;
}
A little explanation can be the functions: the substring function extracts a part of a string. the IndexOf() function searches for a string inside the source string. Others should be self-explanatory. This includes a Split function as I pointed out earlier, you can use that instead of strtok..
Upvotes: 0
Reputation: 72667
IMHO the best way is the most readable and understandable way. sscanf
and strtok
totally disqualify with your user/pw extraction from an URL.
Instead, look for the boundaries of the strings you are looking for (in an URL the slash, the at-sign, the colon, what have you) with strchr
and strrchr
, then memcpy from start to end to where you need the data and tack on a NUL. This also allows for appropriate error handling should the string have an unexpected format.
Upvotes: 2