Bryan
Bryan

Reputation: 35

C: fscanf - infinite loop when first character matches

I am attempting to parse a text (CSS) file using fscanf and pull out all statements that match this pattern:

@import "some/file/somewhere.css";

To do this, I have the following loop set up:

FILE *file = fopen(pathToSomeFile, "r");
char *buffer = (char *)malloc(sizeof(char) * 9000);

while(!feof(file))
{
    // %*[^@] : Read and discard all characters up to a '@'
    // %8999[^;] : Read up to 8999 characters starting at '@' to a ';'.
    if(fscanf(file, "%*[^@] %8999[^;]", buffer) == 1)
    {
        // Do stuff with the matching characters here.
        // This code is long and not relevant to the question.
    }
}

This works perfectly SO LONG AS the VERY FIRST character in the file is not a '@'. (Literally, a single space before the first '@' character in the CSS file will make the code run fine.)

But if the very first character in the CSS file is a '@', then what I see in the debugger is an infinite loop -- execution enters the while loop, hits the fscanf statement, but does not enter the 'if' statement (fscanf fails), and then continues through the loop forever.

I believe my fscanf formatters may need some tweaking, but am unsure how to proceed. Any suggestions or explanations for why this is happening?

Thank you.

Upvotes: 3

Views: 995

Answers (3)

Bart van Ingen Schenau
Bart van Ingen Schenau

Reputation: 15768

Your format string does the following actions:

  • Read (and discard) 1 or more non-@ characters
  • Read (and discard) 0 or more whitespace characters (due to the space in the format string)
  • Read and store 1 to 8999 non-; characters

Unfortunately, there is no format specifier for reading "zero or more" characters from a user-defined set.

If you don't care about multiple @include statements on a line, you could change your code to read a single line (with fgets), and then extract the @include statement from that (if the first character does not equal @, you can use your current format string with sscanf, otherwise, you could use sscanf(line, "%8999[^;]", buffer)).

If multiple @include statemens on a line should be handled correctly, you could inspect the next character to be read with getc and then put it back with ungetc.

Upvotes: 0

Tilo Prütz
Tilo Prütz

Reputation: 1814

Oli already said why fscanf failed. And since failure is a normal state for fscanf your busy loop is not the consequence of the fscanf failure but of the missing handling for it.

You have to handle a fscanf failure even if your format would be correct (in your special case), because you cannot be sure that the input always is matchable by the format. Actually you can be sure that much more nonmatching input exists than matching input.

Upvotes: 1

Oliver Charlesworth
Oliver Charlesworth

Reputation: 272647

I'm not an expert on scanf pattern syntax, but my interpretation of yours is:

  • Match a non-empty sequence of non-'@' characters, then
  • Match a non-empty sequence of up to 8999 non-';' characters

So yes, if your string starts with a '@', then the first part will fail.

I think if you start your format string with some whitespace, then fscanf will eat any leading whitespace in your data string, i.e. simply " %8999[^;]".

Upvotes: 2

Related Questions