LunaticJape
LunaticJape

Reputation: 1584

How to match std::stringstream through AST?

Target code block:

int age = 5;
std::stringstream q;
q << "my name "
<< "is Tom, "
<< "my age is " << age;

I'm trying to create a matcher to match the entire block from the second line.

I tried a very case specific matcher, but it does not work. My goal is to match similar code blocks, where the LHS and RHS number is not fixed.

cxxOperatorCallExpr(hasOverloadedOperatorName("<<"),
  hasLHS(ignoringParenImpCasts(expr(anyOf(
    stringLiteral().bind("firstString"),
    stringLiteral().bind("secondString"),
    stringLiteral().bind("thirdString"))))),
  hasRHS(ignoringParenImpCasts(expr(anyOf(
    declRefExpr(to(varDecl(hasType(cxxRecordDecl(hasName("std::stringstream")))))),
    declRefExpr(hasType(isInteger())))))))

I tried to test from finding the line std::stringstream q; with varDecl(hasType(cxxRecordDecl(hasName("std::stringstream")))), but it returns empty as well.

How should I start or even how to build a matcher based on the ast dump log?

Appreciate any help.

Upvotes: 1

Views: 59

Answers (1)

Scott McPeak
Scott McPeak

Reputation: 12749

Match a variable declaration of type std::stringstream

Starting with the second question, to match a variable declaration of type std::stringstream, use a Clang AST matcher like this:

varDecl(
  hasType(
    asString("std::stringstream")
  )
)

The asString matcher turns the type into a string, making it reasonably easy to use.

The attempted matcher in the question:

varDecl(hasType(cxxRecordDecl(hasName("std::stringstream"))))

fails for a few reasons:

  • The sequence hasType(cxxRecordDecl(...)) will never match because cxxRecordDecl matches a declaration (a piece of syntax) while hasType matches a types (an abstract semantic notion). You would need to at least insert hasDeclaration between the two, but that cannot be directly applied until the kind of type has been further refined, since not all types have declarations.

  • The actual type in this case is an ElaboratedType, which Clang seemingly sprinkles throughout its Type structures in a way I find somewhat unpredictable. That makes it hard in general to match Types.

  • The type to which the name std::stringstream refers is a TypedefType for a template specialization, rather than a stand-alone class, so cxxRecordDecl won't match. (Note: TypedefType is used for type aliases created with either typedef or using.)

If you want to match a declaration of a std::stringstream variable without using asString, use this more complicated matcher:

varDecl(
  hasType(
    elaboratedType(
      namesType(
        typedefType(
          hasDeclaration(
            namedDecl(
              hasName("std::stringstream")
            )
          )
        )
      )
    )
  )
)

Match operator<< applied to a stringstream

Armed with the above, we can try to match a use of operator<< where the left-hand side is a stringstream. The question has this example expression:

q << "my name " << "is Tom, " << "my age is " << age;

This is parsed as a nested tree of binary operators:

(((q << "my name ") << "is Tom, ") << "my age is ") << age;

So we need to look for a use of operator<< where somewhere in the left-hand side is a stringstream, and the right-hand side is either a string literal or an integer-valued expression (I'm partly guessing the intent based on the attempted matcher in the question). That can be done like so:

cxxOperatorCallExpr(
  hasOverloadedOperatorName("<<"),
  hasLHS(
    hasDescendant(
      expr(
        hasType(
          asString("std::stringstream")
        )
      ).bind("stringStreamExpr")
    )
  ),
  hasRHS(
    ignoringParenImpCasts(
      expr(
        anyOf(
          stringLiteral(
          ).bind("stringLiteral"),
          expr(
            hasType(
              isInteger()
            )
          ).bind("intExpr")
        )
      )
    )
  )
)

This will separately report matches for each occurrence of operator<<. It does not try to report a single match for the entire compound expression with all of the various arguments at once; the AST matcher language is not really powerful enough to do that robustly.

Complete example

Here is a shell script that runs clang-query with the above matcher:

#!/bin/sh

PATH=/d/opt/clang+llvm-18.1.8-msvc/bin:$PATH

matcher='
  cxxOperatorCallExpr(
    hasOverloadedOperatorName("<<"),
    hasLHS(
      hasDescendant(
        expr(
          hasType(
            asString("std::stringstream")
          )
        ).bind("stringStreamExpr")
      )
    ),
    hasRHS(
      ignoringParenImpCasts(
        expr(
          anyOf(
            stringLiteral(
            ).bind("stringLiteral"),
            expr(
              hasType(
                isInteger()
              )
            ).bind("intExpr")
          )
        )
      )
    )
  )
'

clang-query \
  -c "set bind-root false" \
  -c "m $matcher" \
  test.cc --

# EOF

Test input file test.cc:

// test.cc
// Match an `operator<<` expression involving `std::stringstream`.

#include <sstream>                     // std::stringstream

void f()
{
  int age = 5;
  std::stringstream q;
  q << "my name "
  << "is Tom, "
  << "my age is " << age;
}

// EOF

Output of the script:

Match #1:

$PWD\test.cc:12:22: note: 
      "intExpr" binds here
   12 |   << "my age is " << age;
      |                      ^~~
$PWD\test.cc:10:3: note: 
      "stringStreamExpr" binds here
   10 |   q << "my name "
      |   ^

Match #2:

$PWD\test.cc:12:6: note: 
      "stringLiteral" binds here
   12 |   << "my age is " << age;
      |      ^~~~~~~~~~~~
$PWD\test.cc:10:3: note: 
      "stringStreamExpr" binds here
   10 |   q << "my name "
      |   ^

Match #3:

$PWD\test.cc:11:6: note: 
      "stringLiteral" binds here
   11 |   << "is Tom, "
      |      ^~~~~~~~~~
$PWD\test.cc:10:3: note: 
      "stringStreamExpr" binds here
   10 |   q << "my name "
      |   ^

Match #4:

$PWD\test.cc:10:8: note: 
      "stringLiteral" binds here
   10 |   q << "my name "
      |        ^~~~~~~~~~
$PWD\test.cc:10:3: note: 
      "stringStreamExpr" binds here
   10 |   q << "my name "
      |   ^
4 matches.

Issues with the matcher in the question

The question has this attempted matcher:

cxxOperatorCallExpr(hasOverloadedOperatorName("<<"),
  hasLHS(ignoringParenImpCasts(expr(anyOf(
    stringLiteral().bind("firstString"),
    stringLiteral().bind("secondString"),
    stringLiteral().bind("thirdString"))))),
  hasRHS(ignoringParenImpCasts(expr(anyOf(
    declRefExpr(to(varDecl(hasType(cxxRecordDecl(hasName("std::stringstream")))))),
    declRefExpr(hasType(isInteger())))))))

This has several problems worth identifying as part of clarifying how the AST is structured and how matchers work:

  • It seems to be looking for string literals on the left-hand side and stringstream variables on the right-hand side; those should be swapped.

  • It is trying to match multiple string literals inside anyOf, but anyOf will stop after the first branch succeeds. If you change that to allOf, then all of the bindings will just go to the same expression. In some cases, forEachDescendant can be used to do something like this, but I find it difficult to use robustly, and it would not apply here (or at least I don't see how to make it work in this case).

  • It seems to expect the entire statement to be a single cxxOperatorCallExpr, but it's actually a tree of them.

  • It has the issues with recognizing stringstream variables explained in the first part of this answer.

Upvotes: 2

Related Questions