Reputation: 34609
I'm building a program with ANTLR where I ask the user to enter some Java code, and it spits out equivalent C# code. In my program, I ask the user to enter some Java code and then parse it. Up until now I've been assuming that they will enter something that will parse as a valid compilation unit on its own, e.g. something like
package foo;
class A { ... }
class B { ... }
class C { ... }
However, that isn't always the case. They might just enter code from the inside of a class:
public void method1() {
...
}
public void method2() {
...
}
Or the inside of a method:
System.out.print("hello ");
System.out.println("world!");
Or even just an expression:
context.getSystemService(Context.ACTIVITY_SERVICE)
If I try to parse such snippets by calling parser.compilationUnit()
, it won't work correctly because most of the code is parsed as error nodes. I need to call the correct method depending on the nature of the code, such as parser.expression()
or parser.blockStatements()
. However, I don't want to ask the user to explicitly indicate this. What's the best way to infer what kind of code I'm parsing?
Upvotes: 4
Views: 343
Reputation: 4501
Our algorithm in Swiftify tries to select the best suitable parse rule from the defined rule set. This web-service converts Objective-C code fragments to Swift and you can estimate the quality of conversion immediately by your own.
We use open-sourced ObjectiveC grammar. Detail Steps of algorithm look like this:
translationUnit
implementationDefinitionList
interfaceDeclarationList
expression
compoundStatement
There are test code samples that parsed with different parse rules:
translationUnit
: http://swiftify.me/clye5zimplementationDefinitionList
: http://swiftify.me/fpaszainterfaceDeclarationList
: http://swiftify.me/13rv2jcompoundStatement
: http://swiftify.me/4cpl9nOur algorithm is able to detect suitably parse rule even with an incorrect input:
compoundStatement
with errors: http://swiftify.me/13rv2j/1Upvotes: 1
Reputation: 6001
Rather than trying to guess a valid grammar rule entry point to parse a language snippet of unknown scope, progressively add scope wrappers to the source text until a valid top-level rule parse is achieved.
That is, with each successive parse failure, progressively add dummy package, class, & method statements as source text wrappers.
Whichever wrapper was added to achieve a successful parse will then be a known quantity. Therefore, the parse tree node representing the original source text can be easily identified.
Probably want to use a fail-fast parser; construct the parser with the BailErrorStrategy
to obtain this behavior.
Upvotes: 1