Jean
Jean

Reputation: 1490

C tokenize and store into array

I have a file, I want to read each line, tokenize it by tabs and store into an array. But it turns out that token[0]..token[4] are pointing to addresses of each char that results from strtok(). So token[0]...token[4] change each time I call strtok on the next line of the file. How do I correct this? If I try char tokens[MAX_SIZE] instead of char* tokens[MAX_SIZE], an error of conversion occurs because strtok returns char *.

The file is

20  34  90  10  77

80  12  37  29  63

45  21  55  18  46

My code is:

FILE *f;
if ((f = fopen("myinput.txt","r")) == NULL) {
    perror("Failed to open file:");
    return -1;
}
char * line;
size_t len = 0;
char *tokens[MAX_SIZE];
int i = 0;
while (getline(&line, &len, f) !=-1) {

    char* lineWithoutNullByte = strtok(line,"\n");
    tokens[i]=strtok(lineWithoutNullByte,"\t");
    i++;
    int x = 1;
    while (x){

                tokens[i] = strtok(NULL, "\t");
                if (tokens[i] == NULL){
                    x=0;
                }else{
                    i++;
                }


    }
    printf("test: %s %s %s %s %s\n", tokens[0],tokens[1],tokens[2],tokens[3],tokens[4] );


}

The expected output is

    test: 20 34 90 10 77
    test: 20 34 90 10 77
    test: 20 34 90 10 77

But I am getting:

    test: 20 34 90 10 77
    test: 80 12 37 29 63
    test: 45 21 55 18 46

To clarify: This means, if I print the entire tokens array, I will be getting

45 21 55 18 46
45 21 55 18 46
45 21 55 18 46

Upvotes: 4

Views: 3221

Answers (2)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726609

You are not using tokens that you get from strtok correctly: the tokens that you get come from the buffer returned by getline. The first call gives you a new buffer; subsequent calls write into the same buffer, because the line fits into the allocated space.

Since you store pointers into that buffer, the next time a line with new data is placed into the old space, all tokens pointing to that address will "see" the new data. To avoid this problem, you need to copy the tokens right after taking them from strtok, for example, by passing them to strdup:

char *tmp = strtok(NULL, "\t");
if (tmp == NULL) {
    x = 0;
    tokens[i] = NULL;
} else {
    i++;
    tokens[i] = strdup(tmp);
}

You would need to strdup the first token as well.

Notes: if you take this approach, you would need to free the individual tokens once your program is done with them. You also need to free the buffer returned by getline at the end of the outer while loop:

free(line);

In addition, strtok is non-reentrant, meaning that it cannot be used in concurrent environments, or even to tokenize strings in nested loops. You should use strtok_r instead.

Upvotes: 3

bm2i
bm2i

Reputation: 75

You should use strtok_r instead of strtok. Because strtok just effective in the first time. I don't know reason but I have faced with this problem once time.

Upvotes: 0

Related Questions