Reputation: 35
I am attempting to parse a text (CSS) file using fscanf and pull out all statements that match this pattern:
@import "some/file/somewhere.css";
To do this, I have the following loop set up:
FILE *file = fopen(pathToSomeFile, "r");
char *buffer = (char *)malloc(sizeof(char) * 9000);
while(!feof(file))
{
// %*[^@] : Read and discard all characters up to a '@'
// %8999[^;] : Read up to 8999 characters starting at '@' to a ';'.
if(fscanf(file, "%*[^@] %8999[^;]", buffer) == 1)
{
// Do stuff with the matching characters here.
// This code is long and not relevant to the question.
}
}
This works perfectly SO LONG AS the VERY FIRST character in the file is not a '@'. (Literally, a single space before the first '@' character in the CSS file will make the code run fine.)
But if the very first character in the CSS file is a '@', then what I see in the debugger is an infinite loop -- execution enters the while loop, hits the fscanf statement, but does not enter the 'if' statement (fscanf fails), and then continues through the loop forever.
I believe my fscanf formatters may need some tweaking, but am unsure how to proceed. Any suggestions or explanations for why this is happening?
Thank you.
Upvotes: 3
Views: 995
Reputation: 15768
Your format string does the following actions:
@
characters;
charactersUnfortunately, there is no format specifier for reading "zero or more" characters from a user-defined set.
If you don't care about multiple @include statements on a line, you could change your code to read a single line (with fgets), and then extract the @include statement from that (if the first character does not equal @
, you can use your current format string with sscanf, otherwise, you could use sscanf(line, "%8999[^;]", buffer)
).
If multiple @include statemens on a line should be handled correctly, you could inspect the next character to be read with getc
and then put it back with ungetc
.
Upvotes: 0
Reputation: 1814
Oli already said why fscanf failed. And since failure is a normal state for fscanf your busy loop is not the consequence of the fscanf failure but of the missing handling for it.
You have to handle a fscanf failure even if your format would be correct (in your special case), because you cannot be sure that the input always is matchable by the format. Actually you can be sure that much more nonmatching input exists than matching input.
Upvotes: 1
Reputation: 272647
I'm not an expert on scanf
pattern syntax, but my interpretation of yours is:
'@'
characters, then';'
charactersSo yes, if your string starts with a '@'
, then the first part will fail.
I think if you start your format string with some whitespace, then fscanf
will eat any leading whitespace in your data string, i.e. simply " %8999[^;]"
.
Upvotes: 2