Reputation: 1009
I'm trying to tokenize a string in C based upon \r\n
delimiters, and want to print out each string after subsequent calls to strtok()
. In a while
loop I have, there is processing done to each token.
When I include the processing code, the only output I receive is the first token, however when I take the processing code out, I receive every token. This doesn't make sense to me, and am wondering what I could be doing wrong.
Here's the code:
#include <stdio.h>
#include <time.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
int main()
{
int c = 0, c2 = 0;
char *tk, *tk2, *tk3, *tk4;
char buf[1024], buf2[1024], buf3[1024];
char host[1024], path[1024], file[1024];
strcpy(buf, "GET /~yourloginid/index.htm HTTP/1.1\r\nHost: remote.cba.csuohio.edu\r\n\r\n");
tk = strtok(buf, "\r\n");
while(tk != NULL)
{
printf("%s\n", tk);
/*
if(c == 0)
{
strcpy(buf2, tk);
tk2 = strtok(buf2, "/");
while(tk2 != NULL)
{
if(c2 == 1)
strcpy(path, tk2);
else if(c2 == 2)
{
tk3 = strtok(tk2, " ");
strcpy(file, tk3);
}
++c2;
tk2 = strtok(NULL, "/");
}
}
else if(c == 1)
{
tk3 = strtok(tk, " ");
while(tk3 != NULL)
{
if(c2 == 1)
{
printf("%s\n", tk3);
// strcpy(host, tk2);
// printf("%s\n", host);
}
++c2;
tk3 = strtok(NULL, " ");
}
}
*/
++c;
tk = strtok(NULL, "\r\n");
}
return 0;
}
Without those if else
statements, I receive the following output...
GET /~yourloginid/index.htm HTTP/1.1
Host: remote.cba.csuohio.edu
...however, with those if else
statements, I receive this...
GET /~yourloginid/index.htm HTTP/1.1
I'm not sure why I can't see the other token, because the program ends, which means that the loop must occur until the end of the entire string, right?
Upvotes: 1
Views: 659
Reputation: 4491
strtok
stores "the point where the last token was found" :
"The point where the last token was found is kept internally by the function to be used on the next call (particular library implementations are not required to avoid data races)." -- reference
That's why you can call it with NULL the second time.
So your calling it again with a different pointer inside your loop makes you loose the state of the initial call (meaning tk = strtok(NULL, "\r\n")
will be NULL by the end of the while, because it will be using the state of the inner loops).
So the solution is probably to change the last line of the while from:
tk = strtok(NULL, "\r\n");
to something like (please check the bounds first, it should not go after buf + strlen(buf)
):
tk = strtok(tk + strlen(tk) + 1, "\r\n");
Or use strtok_r, which stores the state externally (like in this answer).
// first call
char *saveptr1;
tk = strtok_r(buf, "\r\n", &saveptr1);
while(tk != NULL) {
//...
tk = strtok_r(NULL, "\r\n", &saveptr1);
}
Upvotes: 4
Reputation: 1439
One thing that stands out to me is that unless you are doing something else with the string buffer, there is no need to copy each token to its own buffer. The strtok function returns a pointer to the beginning of the token, so you can use the token in place. The following code may work better and be easier to understand:
#define MAX_PTR = 4
char buff[] = "GET /~yourloginid/index.htm HTTP/1.1\r\nHost: remote.cba.csuohio.edu\r\n\r\n";
char *ptr[MAX_PTR];
int i;
for (i = 0; i < MAX_PTR; i++)
{
if (i == 0) ptr[i] = strtok(buff, "\r\n");
else ptr[i] = strtok(NULL, "\r\n");
if (ptr[i] != NULL) printf("%s\n", ptr[i]);
}
The way that I defined the buffer is something that I call a pre-loaded buffer. You can use an array that is set equal to a string to initialize the array. The compiler will size it for you without you needing to do anything else. Now inside the for loop, the if statement determines which form of strtok is used. So if i == 0, then we need to initialize strtok. Otherwise, we use the second form for all subsequent tokens. Then the printf just prints the different tokens. Remember, strtok returns a pointer to a spot inside the buffer.
If you really are doing something else with the data and you really do need the buffer for other things, then the following code will work as well. This uses malloc to allocate blocks of memory from the heap.
#define MAX_PTR = 4
char buff[] = "GET /~yourloginid/index.htm HTTP/1.1\r\nHost: remote.cba.csuohio.edu\r\n\r\n";
char *ptr[MAX_PTR];
char *bptr; /* buffer pointer */
int i;
for (i = 0; i < MAX_PTR; i++)
{
if (i == 0) bptr = strtok(buff, "\r\n");
else bptr = strtok(NULL, "\r\n");
if (bptr != NULL)
{
ptr[i] = malloc(strlen(bptr + 2));
if (ptr[i] == NULL)
{
/* Malloc error check failed, exit program */
printf("Error: Memory Allocation Failed. i=%d\n", i);
exit(1);
}
strncpy(ptr[i], bptr, strlen(bptr) + 1);
ptr[i][strlen(bptr) + 1] = '\0';
printf("%s\n", ptr[i]);
}
else ptr[i] = NULL;
}
This code does pretty much the same thing, except that we are copying the token strings into buffers. Note that we use an array of char pointers to do this. THe malloc call allocates memory. Then we check if it fails. If malloc returns a NULL, then it failed and we exit program. The strncpy function should be used instead of strcpy. Strcpy does not allow for checking the size of the target buffer, so a malicious user can execute a buffer overflow attack on your code. The malloc was given strlen(bptr) + 2. This is to guarantee that the size of the buffer is big enough to handle the size of the token. The strlen(bptr) + 1 expressions are to make sure that the copied data doesn't overrun the buffer. As an added precaution, the last byte in the buffer is set to 0x00. Then we print the string. Note that I have the if (bptr != NULL). So the main block of code will be executed only if strtok returns a pointer to a valid string, otherwise we set the corresponding pointer entry in the array to NULL.
Don't forget to free() the pointers in the array when you are done with them.
In your code, you are placing things in named buffers, which can be done, but it's not really good practice because then if you try to use the code somewhere else, you have to make extensive modifications to it.
Upvotes: 0
Reputation: 126203
strtok
stores the state of the last token in a global variable, so that the next call to strtok
knows where to continue. So when you call strtok(buf2, "/");
in the if
, it clobbers the saved state about the outser tokenization.
The fix is to use strtok_r
instead of strtok
. This function takes an extra argument that is used to store the state:
char *save1, *save2, *save3;
tk = strtok_r(buf, "\r\n", &save1);
while(tk != NULL) {
printf("%s\n", tk);
if(c == 0) {
strcpy(buf2, tk);
tk2 = strtok_r(buf2, "/", &save2);
while(tk2 != NULL) {
if(c2 == 1)
strcpy(path, tk2);
else if(c2 == 2) {
tk3 = strtok_r(tk2, " ", &save3);
strcpy(file, tk3); }
++c2;
tk2 = strtok_r(NULL, "/", &save2); }
} else if(c == 1) {
tk3 = strtok_r(tk, " ", &save2);
while(tk3 != NULL) {
if(c2 == 1) {
printf("%s\n", tk3);
// strcpy(host, tk2);
// printf("%s\n", host);
}
++c2;
tk3 = strtok_r(NULL, " ", &save2); } }
++c;
tk = strtok_r(NULL, "\r\n", &save1); }
return 0;
}
Upvotes: 0