Reputation: 762
int i = 0;
while(fgets(lineStr, sizeof(lineStr), pFile)!=NULL){
puts(lineStr);
pch = strtok (lineStr, delim);
while(pch != NULL){
printf("%s\n",pch);
pch = strtok(NULL,delim);
}
}
Overview: I'm trying to write a miniature version of grep (aka find the number of occurrences of a word in a text file). The entire code http://pastebin.com/VzTJkLK3
Problem: I'm trying to use strtok to tokenize an array of characters representing a line of text. I noticed using gdb that I get a segmentation fault like
Program received signal SIGSEGV, Segmentation fault. __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31 31 ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory.
Any hints or links to some more documentation is appreciated.
PS: I was told that using strtok is not a good programming practice - I'm a noob in C btw. What alternative would you recommend?
Upvotes: 3
Views: 1085
Reputation: 66194
Your code doesn't include string.h
for including the prototypes of both strlen()
and strtok()
. The resulting behaviour is an interesting "feature" provided for legacy C compilation; the implicit declaration.
In C, if you don't declare a proper prototype (or the actual function isn't implemented) before its usage in a translation unit, the compiler will dutifully generate one for you, with a default return value type of int
. This can often be a huge problem, and any decent compiler worth its salt will at least give you a warning about it, something to the effect of "Warning implicit declaration of function "foo" returns int
"
So why is that such a bummer? Well, without including string.h
, the compiler assumes the two functions you're using, strlen()
and strtok()
, look like this:
int strlen();
int strtok();
This declares two function prototypes, both returning int
and accepting zero-or-more parameters. Another "helpful" feature of C for invoking such functions is allowing you to pass anything you want to these as arguments. The compiler will happily push them on the stack by-value:
int n = strlen(str); // pushes char* on the stack, then makes the call.
and similar, but not quite the same:
char *p = strtok(str, delim); // pushes two char* on the stack
So why did strlen
seem to work, but strtok
faulted? Well, because on your platform, int
(the implied return type of your undeclared strtok()
function) is not the same byte-size as char*
, the place you're storing said-return value. In all likelihood you're on a 64-bit platform and int
is 32bits, but pointers are 64-bit.
Therefore, only half the pointer is being saved, the other half (32bits) is not retained. thus the returned pointer is invalid and therefore kerboom.
The reason strlen
appears to work is only because the value returned as an int
' "fits" into your result variable. I.e. the function actually returned (in its return
statement) a 64-bit int, but the caller-side (your code) only saved the "bottom" half. The value in the bottom half was sufficient to accurately reflect the length (the top half was 0
). Were the string enormous and requiring more than 32-bits to represent its length the same problem would have arisen. (and note, and that point you would have had other issues, like how did you get a contiguous 4gB string into your process address space).
Note: Closely related to this is the main reason you never cast the result of malloc()
in C programs. A hard cast hides the warnings that will be emitted from this. It is also best-evidence it is good practice to always have pedantic warning levels enabled and turn on warnings-as-errors. In doing so things like this won't pass compilation and will be quickly discovered.
Upvotes: 11