peoro
peoro

Reputation: 26060

Why compilers don't translate in simpler languages?

Usually compilers translate from the language they support to assembly. Or at most to an assembly-like language (bytecode), like GIMPLE/GENERIC for GCC or Python/Java/.NET bytecode.

Wouldn't it be simpler for a compiler translate to a simpler language, which already implement a big subset of their grammar?

For example an Objective-C compiler, which is 100% compatible with C, could add the semantics only for the syntax it extends to C's, translating it into C. I can see many advantages of doing this; one could use this Objective-C compiler to translate its code into C in order to compile the generated C code with a different compiler that doesn't support C++ (but that optimizes more, or that compiles quicker, or able to compile for more architectures). Or one would be able to use the generated C code in a project where only C is allowed.

I guess/hope that if things were working like this, it would have been a lot easier to write extensions for current languages (eg: adding to C++ keywords to ease the implementation of common patterns, or, still in C++, removing the declare before use rule by moving inline member functions to the end of header files)

What kind of penalties would there be? Generated code would be very difficult to be understood by humans? Compilers wouldn't be able to optimize as much as they can now? What else?

Upvotes: 9

Views: 709

Answers (5)

MauganRa
MauganRa

Reputation: 495

Haskell is actually compiled this way: the GHC compiler first translates the source code to an intermediary functional language (which is less rich than Haskell self), performs optimizations and then lowers the whole thing to C code which is then compiled by GCC. This solutions has problems tough, and projects were started to replace this backend.

http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html

Upvotes: 1

SK-logic
SK-logic

Reputation: 9715

There is a compilers construction stack which is fully based on this idea. Any new language is implemented as a trivial translation into a lower level language or a combination of languages which are already defined within this stack.

http://www.meta-alternative.net/mbase.html

However, in order to be able to do so, you'd need at least some metaprogramming capabilities in every little language you add to a hierarchy. This requirement adds some severe limitations on languages semantics.

Upvotes: 0

slebetman
slebetman

Reputation: 113876

Incidentally, as a counterexample, Tcl is one language that is known to be very-very hard (if not totally impossible) to translate to C. Over the last 20 years there have been a couple of projects that tried this, even one promise of a commercial product but none have materialized.

In part it is because Tcl is a very dynamic language (as any language with an eval function is). In part it is because the only way to know if something is code or data is to run the program.

Upvotes: 2

Tommy
Tommy

Reputation: 100622

Since Objective-C is a strict superset of C and C++ contains a very large amount that is a lot like C, to parse either you effectively already need to be able to parse C. In which case, outputting to machine code and outputting to more C code aren't substantially different in processing cost, the main cost to the user being that compiling now takes as long as it originally did plus the amount of time a second compiler takes.

Any attempt to copy and paste the stuff that looks like C and translate the rest around it would be prone to problems. Firstly, C++ isn't a strict superset of C so things that look like C don't necessarily compile exactly the same anyway (especially versus C99). And even if they did, supposing a user made an error in their C stuff, compilers don't tend to provide error information in a machine readable format, so it'd be really hard for the Objective-C to C layer to give the user a meaningful error after receiving e.g. "error at line 99".

That said, many compiler suites, like GCC and even more so like the upcoming Clang + LLVM, use an intermediate form to decouple the bit that knows about the specifics of one architecture from the bit that knows the specifics of a particular language. However, it tends to be more of a data structure than something intentionally easy to express as a written language.

So: compilers don't work like this for purely practical reasons.

Upvotes: 1

rtpg
rtpg

Reputation: 2439

This is actually used by a lot of languages, through the use of Intermediate languages. The biggest example for this would be Pascal, which had the Pascal-P system: Pascal was compiled into a hypothetical assembly language. To port pascal would only mean making a compiler for this assembly language: a task a lot simpler than porting the entire pascal compiler. After writing this compiler, you'd only need to compile the (machine-independent) pascal compiler that was written in this.

Bootstrapping is also used quite often in programming language design. Many languages have their compilers written in the same language(Haskell comes to mind here). By doing this, writing a new functionality for the language simply means translating that idea into the current language, putting it into the compiler, then recompiling.

I don't think the problem with this method is really the readability of generated code(I don't sift through assembly byte-code generated through compilers, personally), but one of optimization. Many ideas in higher-level programming languages( weak-typing comes to mind) are hard to automatically translate into lower-level system languages such as C. There's a reason why GCC tends to do its optimization before code generation.

But for the most part, compilers do translate into simpler languages except for maybe the most basic of system languages.

Upvotes: 6

Related Questions