duckduckgo
duckduckgo

Reputation: 1295

is it good to use sscanf for parsing string

I have been using sscanf() in my parser to get some css like tokens such as color code some variations below;

#FDC69A
#ff0
orange

Example code will be;

int r g b;
cosnt char* s = "#FAFAFA";
if(sscanf(s, "#%02x%02x%02x", &r, &g, &b) == 3){
// color code ok
}

My preferred language for current project is c++, I think sscanf can be faster than regular character by character parsing and overall code will be bug free & minimal still it may have portability issues across different compilers.

A thing I noticed is, popular of open source project do not use sscanf for tokenizing input buffers instead they do it char by char, it is a bad programming practice to use sscanf in parsing that i am following?

Upvotes: 2

Views: 1749

Answers (2)

Tony Delroy
Tony Delroy

Reputation: 106096

if(sscanf(s, "#%02x%02x%02x", &r, &g, &b) == 3) is robust... nothing to worry about there.

Historically, the big concern with those functions was that someone might specify a format flag that doesn't match the argument (e.g. %d not given an int*)... many modern compilers have enough validation to avoid accidents like that.

Still, C++ has iostreams, and people tend to use those for many I/O and parsing operations as the stream destructors automatically flush and close files and release descriptors, they're type safe, extensible to user-defined types, you can generally reuse parsing/output code for any type of stream, and they're often convenient. They'd be significantly more tedious for your specific test above though.

If you've noticed lots of OSS programs scanning character by character, it may be because:

  • They're doing more complex parsing - where they want to branch to different parsing logic after reading individual characters, or

    • In your code you have a firm expectation of what to expect, so it's reasonable to do a sscanf to test that, but if you were writing say a compiler it'd be too slow to try a huge if/else list of hundreds sscanf attempts to recognise tokens.

 

  • Relevant for scanf, fscanf but not sscanf - avoid scanning too far so they can ungetc, which (from memory) is only portably guaranteed to work for 1 character.

Upvotes: 1

Keith Thompson
Keith Thompson

Reputation: 263247

The biggest problem with sscanf (as well as scanf and fscanf) is that numeric overflow causes undefined behavior. For example:

const char *s = "999999999999999999999999999999";
int n;
sscanf(s, "%d", &n);

The C standard says exactly nothing about how this code behaves. It might set n to some arbitrary value, it might report an error, or it might crash.

(In practice, existing implementations are likely to behave sensibly, for some value of "sensibly".)

Upvotes: 8

Related Questions