Mike Crowe
Mike Crowe

Reputation: 706

Matching multiple arguments to a particular function individually with Clang AST matcher

I'm trying to match unnecessary calls to c_str() when calling functions that are happy to take the std::string directly so that I can remove the unnecessary call so that I can write a clang-tidy check to automatically turn statements like:

fmt::print("{} {}", s1.c_str(), s2.c_str());

into

fmt::print("{} {}", s1, s2);

Whilst I've been able to come up with a matcher that matches the entire statement, it would be more convenient if I could bind all the c_str calls individually. I've tried

auto StringType = hasUnqualifiedDesugaredType(recordType(hasDeclaration(cxxRecordDecl(
                      hasName("::std::basic_string")))));
auto PrintCall = hasName("::fmt::print");
StatementMatcher CStrMatcher = traverse(                                                                                                                                                                                                                              
      TK_AsIs, callExpr(callee(functionDecl(PrintCall)),                                                                                                                                                                                                                
                        hasAnyArgument(cxxMemberCallExpr(                                                                                                                                                                                                               
                                            callee(cxxMethodDecl(hasName("c_str"))),                                                                                                                                                                                    
                                            on(hasType(StringType))).bind("c_str")))                                                                                                                                                                                    
      );                                                                                                                                                                                                                                                                

but I only get a single match no matter how many arguments call c_str. Is there a way to iterate over the separate argument matches that I've bound, or do I need to iterate over all the arguments (whether they matched or not) myself in the check member?

Upvotes: 2

Views: 640

Answers (1)

Scott McPeak
Scott McPeak

Reputation: 12749

The procedure for matching multiple arguments of a variable-argument function depends on whether it is a function template using a parameter pack or an ordinary function with ..., so I'll answer both ways.

Function template with a parameter pack

If the callee is declared something like this:

template <class... Args>
void format(char const *fmt, Args&&... args);

then we can use forEachArgumentWithParam (see AST Matchers Reference) to bind to all of the arguments that satisfy a condition:

    forEachArgumentWithParam(
      expr(
        cxxMemberCallExpr(
          callee(
            cxxMethodDecl(
              hasName("c_str")
            )
          )
        )
      ).bind("arg"),
      anything()
    )

This yields a separate MatchResult for each argument that satisfies the condition.

Complete example

Input:

// test-ft-vararg.cc
// Function template that accepts a variable number of arguments.

template <class... Args>
void format(char const *fmt, Args&&... args)
{}

template <class... Args>
void otherFunc(char const *fmt, Args&&... args)
{}

struct S {
  int c_str();
  int other();
};

void f()
{
  S s;

  format("{}", 1, 2, 3);       // Not reported.

  format("{}",
    1,
    s.c_str(),     // Reported.
    s.c_str(),     // Reported.
    s.other());

  otherFunc("{}", s.c_str());  // Not reported.
}

// EOF

Shell script that runs clang-query:

#!/bin/sh

PATH=$HOME/opt/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04/bin:$PATH

query='m

  callExpr(
    callee(
      functionDecl(
        hasName("format")
      )
    ),
    forEachArgumentWithParam(
      expr(
        cxxMemberCallExpr(
          callee(
            cxxMethodDecl(
              hasName("c_str")
            )
          )
        )
      ).bind("arg"),
      anything()
    )
  ).bind("call")

'

if [ "x$1" = "x" ]; then
  echo "usage: $0 filename.cc -- <compile options like -I, etc.>"
  exit 2
fi

# Run the query.  Setting 'bind-root' to false means clang-query will
# not also print a redundant "root" binding.
clang-query \
  -c="set bind-root false" \
  -c="$query" \
  "$@"

# EOF

Output:

$ ./cmd.sh test-ft-vararg.cc --

Match #1:

$PWD/test-ft-vararg.cc:25:5: note: "arg" binds here
    s.c_str(),     // Reported.
    ^~~~~~~~~
$PWD/test-ft-vararg.cc:23:3: note: "call" binds here
  format("{}",
  ^~~~~~~~~~~~

Match #2:

$PWD/test-ft-vararg.cc:26:5: note: "arg" binds here
    s.c_str(),     // Reported.
    ^~~~~~~~~
$PWD/test-ft-vararg.cc:23:3: note: "call" binds here
  format("{}",
  ^~~~~~~~~~~~
2 matches.

Ordinary function with C-style variable arguments

Alternatively, if the callee looks like:

void format(char const *fmt, ...);

then we cannot use forEachArgumentWithParam because there is no parameter corresponding to each argument. Instead, we have to use a somewhat ugly sequence of hasArgument(N, ...) for a large but fixed N. If you do not need any filtering on the arguments, then it suffices to use a chain of optionally(hasArgument(...)) as shown in this answer (of mine). But if, as in this case, we only want a match when at least one argument satisfies the condition, we have to first use a chain of anyOf(hasArgument(1,...), hasArgument(2,...), ...), then follow it with a chain of optionally(hasArgument(N,...)), repeating the match conditions on all of them.

Although verbose and ugly, one advantage of this approach is it yields all of the matching arguments at once.

Complete example

Input:

// test-of-vararg.cc
// Ordinary function that accepts a variable number of arguments.

void format(char const *fmt, ...)
{}

void otherFunc(char const *fmt, ...)
{}

struct S {
  int c_str();
  int other();
};

void f()
{
  S s;

  format("{}", 1, 2, 3);       // Not reported.

  format("{}",
    1,
    s.c_str(),     // Reported.
    s.c_str(),     // Reported.
    s.other());

  otherFunc("{}", s.c_str());  // Not reported.
}

// EOF

Shell script that runs clang-query:

#!/bin/sh

PATH=$HOME/opt/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04/bin:$PATH

exprMatcher="
  expr(
    cxxMemberCallExpr(
      callee(
        cxxMethodDecl(
          hasName(\"c_str\")
        )
      )
    )
  )"

query="m

  callExpr(
    callee(
      functionDecl(
        hasName(\"format\")
      )
    ),
    anyOf(
      hasArgument(1,
        $exprMatcher
      ),
      hasArgument(2,
        $exprMatcher
      ),
      hasArgument(3,
        $exprMatcher
      ),
      hasArgument(4,
        $exprMatcher
      ),
      hasArgument(5,
        $exprMatcher
      )
    ),
    optionally(
      hasArgument(1,
        $exprMatcher .bind(\"arg1\")
      )
    ),
    optionally(
      hasArgument(2,
        $exprMatcher .bind(\"arg2\")
      )
    ),
    optionally(
      hasArgument(3,
        $exprMatcher .bind(\"arg3\")
      )
    ),
    optionally(
      hasArgument(4,
        $exprMatcher .bind(\"arg4\")
      )
    ),
    optionally(
      hasArgument(5,
        $exprMatcher .bind(\"arg5\")
      )
    )
  ).bind(\"call\")

"

if [ "x$1" = "x" ]; then
  echo "usage: $0 filename.cc -- <compile options like -I, etc.>"
  exit 2
fi

# Run the query.  Setting 'bind-root' to false means clang-query will
# not also print a redundant "root" binding.
clang-query \
  -c="set bind-root false" \
  -c="$query" \
  "$@"

# EOF

Output:

$ ./cmd2.sh test-of-vararg.cc --

Match #1:

$PWD/test-of-vararg.cc:23:5: note: "arg2" binds here
    s.c_str(),     // Reported.
    ^~~~~~~~~
$PWD/test-of-vararg.cc:24:5: note: "arg3" binds here
    s.c_str(),     // Reported.
    ^~~~~~~~~
$PWD/test-of-vararg.cc:21:3: note: "call" binds here
  format("{}",
  ^~~~~~~~~~~~
1 match.

Upvotes: 2

Related Questions