user1717828
user1717828

Reputation: 7225

What does %[^<] (and friends) mean in the formatted string family?

A comment (which should probably be submitted as an answer) has the code

sscanf(string, "<title>%[^<]</title>", extracted_string);

Running the code seems to copy the text between the <title> tags to extracted_string, but I cannot find any references to a caret in the printf family, either in the man pages or elsewhere online.

Can someone point me to a resource that explains the use of %[^<], and other similar syntax, in the sscanf() family?

Upvotes: 2

Views: 981

Answers (3)

user3629249
user3629249

Reputation: 16540

this link explains the [ and ^ usage in scanf family of functions

(emphasis mine)

http://www.cdf.toronto.edu/~ajr/209/notes/printf.html


[

Matches a nonempty sequence of characters from the specified set of accepted characters; the next pointer must be a pointer to char, and there must be enough room for all the characters in the string, plus a terminating null byte. The usual skip of leading white space is suppressed. The string is to be made up of characters in (or not in) a particular set; the set is defined by the characters between the open bracket [ character and a close bracket ] character. The set excludes those characters if the first character after the open bracket is a circumflex (^). To include a close bracket in the set, make it the first character after the open bracket or the circumflex; any other position will end the set. The hyphen character - is also special; when placed between two other characters, it adds all intervening characters to the set. To include a hyphen, make it the last character before the final close bracket. For instance, [^]0-9-] means the set "everything except close bracket, zero through nine, and hyphen". The string ends with the appearance of a character not in the (or, with a circumflex, in) set or when the field width runs out.

Upvotes: 1

Sourav Ghosh
Sourav Ghosh

Reputation: 134286

From the C11 standard document, chapter §7.21.6.2, Paragraph 12, conversion specifiers, (emphasis mine)

[

Matches a nonempty sequence of characters from a set of expected characters (the scanset).

....

The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket (]). The characters between the brackets (the scanlist) compose the scanset, unless the character after the left bracket is a circumflex (^), in which case the scanset contains all characters that do not appear in the scanlist between the circumflex and the right bracket.

A draft version of the standard, found online.

Upvotes: 6

Iharob Al Asimi
Iharob Al Asimi

Reputation: 53006

It means match anything that is not a <, it's not a good idea to do that without specifying the maximum destination buffer length, if your destination buffer can hold say 100 characters, then

char extracted_string[100];
sscanf(string, "<title>%99[^<]</title>", extracted_string);

would be a better solution.

Using strstr() for this purpose allows you to actually make extracted_string dynamic.

Upvotes: 3

Related Questions