Jack
Jack

Reputation: 35

strtok returns NULL despite not having reached the end of the string

I am writing a program that parses input from stdin and calls functions according to the input. The inputs my program is supposed to handle are the following:

end //stops the program
report //prints a specific output
addent "ent_id"
delent "ent_id"
addrel "ent_id1" "ent_id2" "rel_id"
delrel "ent_id1" "ent_id2" "rel_id"

The functions called by the input are not relevant to my issue, but do note the all the arguments that are passed to the functions are between quotation marks.

Here's the code

int main() {
    const char Comando[6][7] = { "addrel", "addent", "delrel", "delent", "report", "end" };
    const char spazio[2] = " ";
    const char newline[3] = "\n";
    const char quote[2] = "\"";
    char sample[100];
    char *temp;
    char *comandoIN;
    char *argomento1;
    char *dest;
    char *rel;

    RelHead = NULL;
    init_array();

    char *str = fgets(sample, 100, stdin);

    for (;;) {
        if (strncmp(sample, Comando[5], 3) == 0) {
            return 0;
        } else if (strncmp(sample, Comando[4], 6) == 0) {
            report();
        } else {
            temp = strtok(sample, newline);
            comandoIN = strtok(temp, spazio);
            argomento1 = strtok(NULL, quote);

            if (strncmp(Comando[1], comandoIN, 7) == 0) {
                addent(argomento1);
            } else if (strncmp(Comando[3], comandoIN, 7) == 0) {
                delent(argomento1);
            } else {
                temp = strtok(NULL, quote);
                dest = strtok(NULL, quote);
                temp = strtok(NULL, quote);
                rel = strtok(NULL, quote);

                if (strncmp(Comando[0], comandoIN, 7) == 0) {
                    addrel(argomento1, dest, rel);
                } else if (strncmp(Comando[2], comandoIN, 7) == 0) {
                    delrel(argomento1, dest, rel);
                }
            }
        }

        char *str = fgets(sample, 69, stdin);
    }
    return 0;
}

The incorrect behavior is cause by the following input:

addrel "The_Ruler_of_the_Universe" "The_Lajestic_Vantrashell_of_Lob" "knows"

which causes the last two calls of strtok to return NULL instead of " " (whitespace) and "knows" respectively (without quotation marks). Furthermore, if this is the first input given to the program, it behaves correctly, and if it's the last, the following cycle will put "knows" in the "comandoIN" variable. This is the only input I've found so far that causes this issue, and I think it has something to do with removing the newline character with the first call of strtok.

This is an assignment for uni, so we have several inputs to test the program, and my program passes the first 4 of these (the tests are about 200 inputs each), so I don't really understand what's causing the bug. Any ideas?

Upvotes: 1

Views: 243

Answers (2)

chqrlie
chqrlie

Reputation: 144951

Using strtok for parsing the command line with different sets of separators is confusing and error prone. It would be simpler to parse the command line with a simple loop and handle spaces and quotes explicitly, then dispatch on the first word.

Here is a more systematic approach:

#include <stdio.h>

char *getarg(char **pp) {
    char *p = *pp;
    char *arg = NULL;
    while (*p == ' ')
         p++;
    if (*p == '\0' || *p == '\n')
        return arg;
    if (*p == '"') {
        arg = ++p;
        while (*p != '\0' && *p != '"')
            p++;
        if (*p == '"')
            *p++ = '\0';
    } else {
        arg = p++;
        while (*p != '\0' && *p != ' ' && *p != '\n')
            p++;
        if (*p != '\0')
            *p++ = '\0';
    }
    *pp = p;
    return arg;
}

int main() {
    char sample[100];
    char *cmd, *arg1, *arg2, *arg3;

    RelHead = NULL;
    init_array();

    while (fgets(sample, sizeof sample, stdin)) {
        char *p = sample;
        cmd = getarg(&p);
        arg1 = getarg(&p);
        arg2 = getarg(&p);
        arg3 = getarg(&p);

        if (cmd == NULL) {  // empty line
            continue;
        } else
        if (!strcmp(cmd, "end")) {
            break;
        } else
        if (!strcmp(cmd, "report")) {
            report();
        } else
        if (!strcmp(cmd, "addent")) {
            addent(arg1);
        } else
        if (!strcmp(cmd, "delent")) {
            delent(arg1);
        } else
        if (!strcmp(cmd, "addrel")) {
            addrel(arg1, arg2, arg3);
        } else
        if (!strcmp(cmd, "delrel")) {
            delrel(arg1, arg2, arg3);
        } else {
            printf("invalid command\n");
        }
    }
    return 0;
}

Upvotes: 1

Stephan Schlecht
Stephan Schlecht

Reputation: 27126

The problem here is that the input:

addrel "The_Ruler_of_the_Universe" "The_Lajestic_Vantrashell_of_Lob" "knows"    

is 77 bytes long (76 characters plus terminating NULL).

At the end of your loop you have:

char *str = fgets(sample, 69, stdin);

where your state that your buffer is 69 long.

Why does it behave correctly if it is the first input?

Before the for loop you have:

char *str = fgets(sample, 100, stdin);
for(;;)
...

Here you use a size of 100, so it works if you first use the above input directly after starting the program.

Upvotes: 3

Related Questions