Reputation: 11958
Goal
I'm working on a project to create a Varscoper for Coldfusion CFscript. Basically, this means checking through source code files to ensure that developers have properly var
'd their variables.
After a couple of days of working with ANTLR V4 I have a grammar which generates a very nice parse tree in the GUI view. Now, using that tree I need a way to crawl up and down the nodes programmatically looking for variable declarations and ensure that if they are inside functions they have the proper scoping. If possible I would rather NOT do this in the grammar file as that would require mixing the definition of the language with this specific task.
What I've tried
My latest attempt was using the ParserRuleContext
and attempting to go through it's children
via getPayload()
. After checking the class of getPayLoad()
I would either have a ParserRuleContext
object or a Token
object. Unfortunately, using that I was never able to find a way to get the actual rule type for a specific node, only it's containing text. The rule type for each node is neccessary because it matters whether that text node is an ignored right-hand expression, a variable assignment or a function declaration.
Questions
Here's my sample java code:
Cfscript.java
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.Trees;
public class Cfscript {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRFileStream(args[0]);
CfscriptLexer lexer = new CfscriptLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
CfscriptParser parser = new CfscriptParser(tokens);
parser.setBuildParseTree(true);
ParserRuleContext tree = parser.component();
tree.inspect(parser); // show in gui
/*
Recursively go though tree finding function declarations and ensuring all variableDeclarations are varred
but how?
*/
}
}
Cfscript.g4
grammar Cfscript;
component
: 'component' keyValue* '{' componentBody '}'
;
componentBody
: (componentElement)*
;
componentElement
: statement
| functionDeclaration
;
functionDeclaration
: Identifier? Identifier? 'function' Identifier argumentsDefinition '{' functionBody '}'
;
argumentsDefinition
: '(' argumentDefinition (',' argumentDefinition)* ')'
| '()'
;
argumentDefinition
: Identifier? Identifier? argumentName ('=' expression)?
;
argumentName
: Identifier
;
functionBody
: (statement)*
;
statement
: variableStatement
| nonVarVariableStatement
| expressionStatement
;
variableStatement
: 'var' variableName '=' expression ';'
;
nonVarVariableStatement
: variableName '=' expression ';'
;
expressionStatement
: expression ';'
;
expression
: assignmentExpression
| arrayLiteral
| objectLiteral
| StringLiteral
| incrementExpression
| decrementExpression
| 'true'
| 'false'
| Identifier
;
incrementExpression
: variableName '++'
;
decrementExpression
: variableName '--'
;
assignmentExpression
: Identifier (assignmentExpressionSuffix)*
| assignmentExpression (('+'|'-'|'/'|'*') assignmentExpression)+
;
assignmentExpressionSuffix
: '.' assignmentExpression
| ArrayIndex
| ('()' | '(' expression (',' expression)* ')' )
;
methodCall
: Identifier ('()' | '(' expression (',' expression)* ')' )
;
variableName
: Identifier (variableSuffix)*
;
variableSuffix
: ArrayIndex
| '.' variableName
;
arrayLiteral
: '[' expression (',' expression)* ']'
;
objectLiteral
: '{' (Identifier '=' expression (',' Identifier '=' expression)*)? '}'
;
keyValue
: Identifier '=' StringLiteral
;
StringLiteral
: '"' (~('\\'|'"'))* '"'
;
ArrayIndex
: '[' [1-9] [0-9]* ']'
| '[' StringLiteral ']'
;
Identifier
: [a-zA-Z0-9]+
;
WS
: [ \t\r\n]+ -> skip
;
COMMENT
: '/*' .*? '*/' -> skip
;
Test.cfc (testing code file)
component something = "foo" another = "more" persistent = "true" datasource = "#application.env.dsn#" {
var method = something.foo.test1;
testing = something.foo[10];
testingagain = something.foo["this is a test"];
nuts["testing"]++;
blah.test().test3["test"]();
var math = 1 + 2 - blah.test().test4["test"];
var test = something;
var testing = somethingelse;
var testing = {
test = more,
mystuff = {
interior = test
},
third = "third key"
};
other = "Idunno homie";
methodCall(interiorMethod());
public function bar() {
var new = "somebody i used to know";
something = [1, 2, 3];
}
function nuts(required string test1 = "first", string test = "second", test3 = "third") {
}
private boolean function baz() {
var this = "something else";
}
}
Upvotes: 38
Views: 24818
Reputation: 170158
I wouldn't walk this manually if I were you. After generating a lexer and parser, ANTLR would also have generated a file called CfscriptBaseListener
that has empty methods for all of your parser rules. You can let ANTLR walk your tree and attach a custom tree-listener in which you override only those methods/rules you're interested in.
In your case, you probably want to be notified whenever a new function is created (to create a new scope) and you'll probably be interested in variable assignments (variableStatement
and nonVarVariableStatement
). Your listener, let's call is VarListener
will keep track of all scopes as ANTLR walks the tree.
I did change 1 rule slightly (I added objectLiteralEntry
):
objectLiteral : '{' (objectLiteralEntry (',' objectLiteralEntry)*)? '}' ; objectLiteralEntry : Identifier '=' expression ;
which makes life easier in the following demo:
public class VarListener extends CfscriptBaseListener {
private Stack<Scope> scopes;
public VarListener() {
scopes = new Stack<Scope>();
scopes.push(new Scope(null));
}
@Override
public void enterVariableStatement(CfscriptParser.VariableStatementContext ctx) {
String varName = ctx.variableName().getText();
Scope scope = scopes.peek();
scope.add(varName);
}
@Override
public void enterNonVarVariableStatement(CfscriptParser.NonVarVariableStatementContext ctx) {
String varName = ctx.variableName().getText();
checkVarName(varName);
}
@Override
public void enterObjectLiteralEntry(CfscriptParser.ObjectLiteralEntryContext ctx) {
String varName = ctx.Identifier().getText();
checkVarName(varName);
}
@Override
public void enterFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {
scopes.push(new Scope(scopes.peek()));
}
@Override
public void exitFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {
scopes.pop();
}
private void checkVarName(String varName) {
Scope scope = scopes.peek();
if(scope.inScope(varName)) {
System.out.println("OK : " + varName);
}
else {
System.out.println("Oops : " + varName);
}
}
}
A Scope
object could be as simple as:
class Scope extends HashSet<String> {
final Scope parent;
public Scope(Scope parent) {
this.parent = parent;
}
boolean inScope(String varName) {
if(super.contains(varName)) {
return true;
}
return parent == null ? false : parent.inScope(varName);
}
}
Now, to test this all, here's a small main class:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
public class Main {
public static void main(String[] args) throws Exception {
CfscriptLexer lexer = new CfscriptLexer(new ANTLRFileStream("Test.cfc"));
CfscriptParser parser = new CfscriptParser(new CommonTokenStream(lexer));
ParseTree tree = parser.component();
ParseTreeWalker.DEFAULT.walk(new VarListener(), tree);
}
}
If you run this Main
class, the following will be printed:
Oops : testing Oops : testingagain OK : test Oops : mystuff Oops : interior Oops : third Oops : other Oops : something
Without a doubt, this is not exactly what you want and I probably goofed up some scoping rules of Coldfusion. But I think this will give you some insight in how to solve your problem properly. I think the code is pretty self explanatory, but if this is not the case, don't hesitate to ask for clarification.
HTH
Upvotes: 50