Reputation: 67
I want to split a html page into pieces by a tag delimiter: like <img
or <div>
.
I tried the following code but it doesn't work:
char source[MAXBUFLEN + 1];
FILE *fp = fopen("source.html", "r");
if (fp != NULL)
{
size_t newLen = fread(source, sizeof(char), MAXBUFLEN, fp);
if (newLen == 0) {
fputs("Error reading file", stderr);
} else {
source[++newLen] = '\0'; /* Just to be safe. */
}
}
fclose(fp);
//not working
char* strArray[10];
int i = 0;
char *token = strtok(source, "<img");
while(token != NULL)
{
strcpy(strArray[i++], token);
token = strtok(NULL, "<img");
}
printf("%s\n", strArray[3]);
What am I doing wrong? Is there any other method I can use except strtok?
Upvotes: 0
Views: 319
Reputation: 40145
char *strtokByWord_r(char *str, const char *word, char **store){
char *p, *ret;
if(str != NULL){
*store = str;
}
if(*store == NULL) return NULL;
p = strstr(ret=*store, word);
if(p){
*p='\0';
*store = p + strlen(word);
} else {
*store = NULL;
}
return ret;
}
char *strtokByWord(char *str, const char *word){
static char *store = NULL;
return strtokByWord_r(str, word, &store);
}
replace
char *token = strtok(source, "<img");
...
token = strtok(NULL, "<img");
to
char *token = strtokByWord(source, "<img");
...
token = strtokByWord(NULL, "<img");
Upvotes: 2
Reputation: 9894
As Daren has already posted, strtok()
doesn't do what you want. You can use
char *ptr = strstr( source, "<img" );
instead to find the first tag, and then
ptr = strstr(ptr+4, "<img" ); // search starts direcly behind the previous "<img"
// maybe you can find a better offset
for the next occurances.
Besides, your line
strcpy(strArray[i++], token);
would crash because you have no memory allocated to the pointer.
Upvotes: 2
Reputation: 70324
The second argument to strtok
is a list of delimiter characters. Each of these will be used to split the string into tokens. I don't think it does what you think it does...
If you want to go and parse an html file into tokens, you could look into lex
...
What is your desired output? Do you have a test case for your input?
Your code should produce the following:
input:
<html><img src="test.png"/></html>
output:
I somehow don't think that is what you want...
Upvotes: 0