Wahoozel
Wahoozel

Reputation: 297

Passing string array to function and modifying it corrupts it

I'm currently writing a small programming language in C. I load a script from file and then tokenize it with strtok. The problem is that while in the tokenizer function the output is fine and gives me the expected results, but when the function returns to main the data is corrupted, giving me output like:

╠ a

Instead of the expected:

int

I don't have a ton of experience in C but my best guess is that this is a null termination issue, although I've read that strtok should automatically do that.

Below is the relevant code:

int tokenize(char* script, char* tokens[]) {
    char buffer[256];
    strcpy(buffer, script);

    char* token = strtok(buffer, " ");
    int i = 0;
    while (token) {
        tokens[i] = token;
        printf("token: %s\n", token);
        token = strtok(NULL, " ");
        i++;
    }
    printf("First token (tokenize): %s\n", tokens[0]);

    return i;
}


int main(int argc, char* argv[]) {
    // ....
    char* script = read_script(argv[1]);

    char* tokens[256];
    int token_count = tokenize(script, tokens);
    printf("First token (main): %s\n", tokens[0]);
    // ...
}

And here is the console output:

token: int
token: i
token: =
token: 0

First token (tokenize): int
First token (main): ╠ a

Upvotes: 0

Views: 74

Answers (3)

양기창
양기창

Reputation: 169

Memory space(buffer) that pointed by tokens is located in stack. So, that space is disapeared when "tokenize"function is end. So, you "tokens" variable must pointing heap area.(by using malloc or calloc.)

Upvotes: 0

Stephan Lechner
Stephan Lechner

Reputation: 35154

Your strtok-calls operate on local variable buffer[256], and the memory reserved for this variable is no longer valid once tokenize returns. Hence, any pointer in tokens[] will point to (invalid) memory.

To overcome this, I'd write

    tokens[i] = strdup(token);

intstead of

    tokens[i] = token;

Make sure that the caller frees the memory thereby reserved for each token[i] element, once these elements are not needed any more.

Upvotes: 1

Andrew Henle
Andrew Henle

Reputation: 1

You're returning pointers that point to tokens in a local variable:

int tokenize(char* script, char* tokens[]) {
    char buffer[256];
    ...

buffer no longer exists once tokenize() returns.

Upvotes: 2

Related Questions