Farzad Sadeghi
Farzad Sadeghi

Reputation: 125

Why is the source location end off by two characters for a statement ending in a semicolon?

I'm trying to write a source to source translator using libTooling.

I'm using ASTMatchers to try to find if statements that don't have curly braces and then use a rewriter to add the braces.

The matcher I'm using is:

ifStmt(unless(hasDescendant(compoundStmt())))

Then I just get the start and end locations, and rewrite the curly braces.

Here's the source code for that:

if (const IfStmt *IfS = Result.Nodes.getNodeAs<clang::IfStmt>("ifStmt")) {
const Stmt *Then = IfS->getThen();
Rewrite.InsertText(Then->getLocStart(), "{", true, true);
Rewrite.InsertText(Then->getLocEnd(),"}",true,true);

Now the problem is that for some reason the end location is always off by 2 characters. Why is this so?

Upvotes: 3

Views: 615

Answers (2)

Scott McPeak
Scott McPeak

Reputation: 12863

This is a general issue with the Clang AST: it usually does not record the location of the final semicolon of a statement that ends in one. See discussion Extend Stmt with proper end location? on the LLVM Discourse server.

To solve this problem, the usual approach is to start with the end location as stored in the AST, then use the Lexer class to advance forward until the semicolon is found. This is not 100% reliable because there can be intervening macros and preprocessing directives, but fortunately that is uncommon for the final semicolon of a statement.

There is an example of doing this in clang::arcmt::trans::findSemiAfterLocation in the Clang source code. The essence is these lines:

  // Lex from the start of the given location.
  Lexer lexer(SM.getLocForStartOfFile(locInfo.first),
              Ctx.getLangOpts(),
              file.begin(), tokenBegin, file.end());
  Token tok;
  lexer.LexFromRawLexer(tok);
  if (tok.isNot(tok::semi)) {
    if (!IsDecl)
      return SourceLocation();
    // Declaration may be followed with other tokens; such as an __attribute,
    // before ending with a semicolon.
    return findSemiAfterLocation(tok.getLocation(), Ctx, /*IsDecl*/true);
  }

Upvotes: 1

Farzad Sadeghi
Farzad Sadeghi

Reputation: 125

the SourceLocation i was getting is off by one because it only matches the token and ";" is not part of that. btw, if anybody's wondering how to include the ";" into the range if they want to, you could just use Lexer::MeasureTokenLength and then add that by one and get the new SourceLocaiton by offset.

Upvotes: 2

Related Questions