Reputation: 2157
I am attempting to write a very basic lexxer in C and have the following code which is supposed to just do something like the following:
Input: "12 142 123"
Output:
NUMBER -- 12
NUMBER -- 14
NUMBER -- 123
However, I am having an issue where if I do not include an initial printf("")
statement before looping over the input, then I will get an output like this:
Output:
NUMBER --
NUMBER -- 14
NUMBER -- 123
where the first number is simply blank. I am really confused as to why this is happening and would really appreciate some help with this!
I have the following code (with a number of irrelevant functions omitted)
#define MAX_LEN 400
char* input;
char* ptr;
char curr_type;
char curr;
enum token_type {
END,
NUMBER,
UNEXPECTED
};
typedef struct {
enum token_type type;
char* str;
} Token;
void print_tok(Token t) {
printf("%s -- %s\n", token_types[t.type], t.str);
}
char get(void) {
return *ptr++;
}
char peek(void) {
return *ptr;
}
Token number(void) {
char arr[MAX_LEN];
arr[0] = peek();
get();
int i = 1;
while (is_digit(peek())) {
arr[i] = get();
++i;
}
arr[++i] = '\0';
Token ret = {NUMBER, (char*)arr};
return ret;
}
Token unexpected(void) {
// omitted
}
Token next(void) {
while (is_space(peek())) get();
char c = peek();
switch (peek()) {
case '0':
// omitted
case '9':
return number();
default:
return unexpected();
}
}
int main(int argc, char **argv) {
printf(""); // works fine with this line
input = argv[1];
ptr = input;
Token tokens[MAX_LEN];
Token t;
int i = 0;
do {
t = next();
print_tok(t);
tokens[i++] = t;
} while (t.type != END && t.type != UNEXPECTED);
return 0;
}
Upvotes: 0
Views: 222
Reputation: 58868
In number
, arr
is a local variable. The local variable is destroyed when its function ends and its content is then unpredictable. Nonetheless, your program then prints its value by using a pointer in the Token
struct.
The value that is printed is unpredictable. The extra printf("")
statement may cause the compiler to rearrange the code in a way that causes the variable to not get overwritten, or something like that. You cannot rely on it.
You have several other options to allocate memory per token:
str
in token
so it's an array of chars instead of a pointer. Then each token has its own space to store the string.malloc
. Then it stays allocated until you free
it.main
so it's valid for both next
and print_tok
. You'd have to give next
a pointer to the array, so it knows where it should store the string. This would only store one token's string at a time.next
.Token
which stores how long the token is.I think the first option is easiest and the last option uses the least memory, but I included some other options for completeness.
Upvotes: 2