Reputation: 2978
I need to build a "translator" (is cross-compiler the right word?) between Tradestation's EasyLanguage into C++. However, there isn't any complete documentation on the grammar of EasyLanguage (which I could find).
As a more general question, given a set of valid programs in some Language 'A', is it possible to discern a grammar for 'A' if we know (or even if we don't know) of the existence of certain basic tokens like 'if' 'else' and reserved words, or is this one of those unsolved case specific (hard?) questions.
Are there any useful tools I can use to start?
Upvotes: 3
Views: 1146
Reputation: 95334
The simple answer is "No".
Any kind of generalization from examples suffers from the basic fact that it is guessing. You may guess that the langauge has an 'if' token. There's no guarantee that it does, or that it is spelled if or that it has semantics that you understand. You're not going to get an automated tool to induce the grammar for you.
Your best bet is to take all the documents you can get that describe the langauge, and, well, guess at a grammar. Then you build a parser for the grammar, and validate it against as big a code base as you can find, and revise. I've done this dozens of times with a wide variety of langauges (see my bio).
It is painful, but you often get someplace pretty useful. The good news is that your parser doesn't have to parse anything the users don't know how to write. The bad news is they'll write things based on some obscure example you've never seen, or with a typo that accidentally works. (Even the language designer didn't intend it, but that doesn't matter to the user; his program works and your compiler doesn't. Your problem by definition).
What you'll never know is if the the provider of the language has certain features he simply hasn't documented, and hasn't shown anyone else. Be continually prepared to be surprised, long after you are done :-{
Now, the best tool you can use for this process IMHO is a GLR parser generator; it is what my company uses. These will parse any context-free langauge (that you might propose) without a lot of struggle to bend the grammar to match the other-common restrictions of recursive descent, LL(k), or LR(k) parsers. Life is hard enough to to guess the grammar, let alone guess the grammar and then guess how to bend to it make the parser generator swallow it correctly.
You also have the problem of building a translator, once you get the grammar right. You might find this SO answer helpful: What kinds of patterns could I enforce on the code to make it easier to translate to another programming language?
Upvotes: 5