prapin
prapin

Reputation: 6858

Custom support for __attribute__((format))

Both GCC and Clang have a support to make compile-time checks on variable argument functions like printf. These compilers accept syntax like:

extern void dprintf(int dlevel, const char *format, ...)
  __attribute__((format(printf, 2, 3)));  /* 2=format 3=params */

On OSX, the Cocoa framework also use an extension of this for NSString:

#define NS_FORMAT_FUNCTION(F,A) __attribute__((format(__NSString__, F, A)))

In our company, we have a custom C++ framework with a bunch of classes like BaseString all deriving from BaseObject. In BaseString there are a few variable argument methods similar to sprintf, but with some extensions. For example, "%S" expects an argument of type BaseString*, and "%@" expects a BaseObject* argument.

I would like to perform a compile-time check of the arguments in our projects, but because of the extensions, __attribute__((format(printf))) give lots of false positive warnings.

Is there a way to customize the support of __attribute__((format)) for one of the two compilers ? If this requires a patch to the compiler source, is it doable in a reasonable amount of time ? Alternatively, are there other lint like tools that could perform the check ?

Upvotes: 7

Views: 3376

Answers (4)

With recent version of GCC (I recommend 4.7 or newer, but you could try with a GCC 4.6) you can add your own variables and functions attributes thru a GCC plugin (with the PLUGIN_ATTRIBUTES hook), or a MELT extension. MELT is a domain specific language to extend GCC (implemented as a [meta-]plugin).

If using a plugin (e.g. MELT) you won't need to recompile the source code of GCC. But you need a plugin-enabled GCC (check with gcc -v).

In 2020, MELT is not updated any more (because of lack of funding); however you could write your own GCC plugin for GCC 10 in C++, doing such checks.

Some Linux distributions don't enable plugins in their gcc - please complain to your distribution vendor; others provide a package for GCC plugin development, e.g. gcc-4.7-plugin-dev for Debian or Ubuntu.

Upvotes: 5

dshin
dshin

Reputation: 2398

With c++11, it is possible to solve this problem by replacing __attribute__ ((format)) with a clever combination of constexpr, decltype, and variadic parameter packs. Pass the format string into a constexpr function that extracts out all the % specifiers at compile time, and validate that the n'th specifier matches the decltype of the (n+1)'st argument.

Here is a sketch of the solution...

If you have:

int x = 3;
Foo foo;
my_printf("%d %Q\n", x, foo);

You will need a macro wrapper for my_printf, using the trick described here, to get something like this:

#define my_printf(fmt, ...) \
{ \
    static_assert(FmtValidator<decltype(makeTypeHolder(__VA_ARGS__))>::check(fmt), \
        "one or more format specifiers do not match their arguments"); \
    my_printf_impl(fmt, ## __VA_ARGS__); \
}

You'll need to write FmtValidator and makeTypeHolder().

makeTypeHolder will look something like this:

    template<typename... Ts> struct TypeHolder {};

    template<typename... Ts>
    TypeHolder<Ts...> makeTypeHolder(const Ts&... args)
    {
        return TypeHolder<Ts...>();
    }

Its purpose is to create a type uniquely determined by the types of the arguments passed into my_printf(). The FmtValidator then needs to validate that these types are consistent with the % specifiers found in fmt.

Next, FmtValidator<T>::check() needs to written to extract the % specifiers at compile time (i.e., as a constexpr function). This require some compile-time recursion and looks like this:

    template<typename... Ts>
    struct FmtValidator;

    // recursion base case
    template<>
    struct FmtValidator<>
    {
        static constexpr bool check(const char* fmt)
        {
            return *fmt == '\0' ? true :
                    *fmt != '%' ? check(fmt + 1) :
                    fmt[1] == '%' ? check(fmt + 2) : false;
        }
    };

    // recursion
    template<typename T, typename... Ts>
    struct FmtValidator<TypeHolder<T, Ts...>>
    {
        static constexpr bool check(const char* fmt)
        {
            // find the first % specifier in fmt, validate it against T,
            // and then recursively dispatch with Ts... and the remainder of fmt
            ...
        }
    };

The validation of individual types against individual % specifiers, you can do with something like this:

    template<>
    struct specmatch<int>
    {
        static constexpr bool match(const char* c, const char* cend)
        {
            return strmatches(c, cend, "d") ||
                    strmatches(c, cend, "i");
        }
    };

    // add other specmatch specializations for float, const char*, etc.

And then, you are free to write your own validators with your own custom types.

Upvotes: 2

prapin
prapin

Reputation: 6858

One year and a half after having asked this question, I came out with a totally different approach to solve the real problem: Is there any way to statically check the types of custom variadic formatting statements?

For completeness and because it can help other people, here is the solution I have finally implemented. It has two advantages over the original question:

  • Relatively simple : implemented in less than a day;
  • Compiler independent : can check C++ code on any platform (Windows, Android, OSX, ...).

A Perl script parses the source code, finds the formatting strings and decodes the percent modifiers inside them. It then wraps all arguments with a call to a template identity function CheckFormat<>. Example:

str->appendFormat("%hhu items (%.2f %%) from %S processed", 
    nbItems, 
    nbItems * 100. / totalItems, 
    subject);

Becomes:

str->appendFormat("%hhu items (%.2f %%) from %S processed", 
    CheckFormat<CFL::u, CFM::hh>(nbItems  ), 
    CheckFormat<CFL::f, CFM::_>(nbItems * 100. / totalItems  ), 
    CheckFormat<CFL::S, CFM::_, const BaseString*>(subject  ));

The enumerations CFL, CFM and the template function CheckFormat must be defined in a common header file like this (this is an extract, there are around 24 overloads).

enum class CFL
{
    c, d, i=d, star=i, u, o=u, x=u, X=u, f, F=f, e=f, E=f, g=f, G=f, p, s, S, P=S, at
};
enum class CFM
{
    hh, h, l, z, ll, L=ll, _
};
template<CFL letter, CFM modifier, typename T> inline T CheckFormat(T value) { CFL test= value; (void)test; return value; }
template<> inline const BaseString* CheckFormat<CFL::S, CFM::_, const BaseString*>(const BaseString* value) { return value; }
template<> inline const BaseObject* CheckFormat<CFL::at, CFM::_, const BaseObject*>(const BaseObject* value) { return value; }
template<> inline const char* CheckFormat<CFL::s, CFM::_, const char*>(const char* value) { return value; }
template<> inline const void* CheckFormat<CFL::p, CFM::_, const void*>(const void* value) { return value; }
template<> inline char CheckFormat<CFL::c, CFM::_, char>(char value) { return value; }
template<> inline double CheckFormat<CFL::f, CFM::_, double>(double value) { return value; }
template<> inline float CheckFormat<CFL::f, CFM::_, float>(float value) { return value; }
template<> inline int CheckFormat<CFL::d, CFM::_, int>(int value) { return value; }

...

After having the compilation errors, it is easy to recover the original form with a regular expression CheckFormat<[^<]*>\((.*?) \) replaced by its capture.

Upvotes: 2

ecatmur
ecatmur

Reputation: 157334

It's doable, but it's certainly not easy; part of the problem is that BaseString and BaseObject are user-defined types, so you need to define the format specifiers dynamically. Fortunately gcc at least has support for this, but would still require patching the compiler.

The magic is in the handle_format_attribute function in gcc/c-family/c-format.c, which calls initialization functions for format specifiers that refer to user-defined types. A good example to base your support on would be the gcc_gfc format type, because it defines a format specifier %L for locus *:

/* This will require a "locus" at runtime.  */
{ "L",   0, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "R", NULL },

Obviously though you'd want to base your format_char_info array on print_char_table, as that defines the standard printf specifiers; gcc_gfc is substantially cut down in comparison.

The patch that added gcc_gfc is http://gcc.gnu.org/ml/fortran/2005-07/msg00018.html; it should be fairly obvious from that patch how and where you'd need to make your additions.

Upvotes: 2

Related Questions