Reputation: 10764

How can PyPy be faster than Cpython

I have read PyPy -- How can it possibly beat CPython? and countless other things but i am not able to understand how something written in Python be faster than python itself.

The only way I can think of is that PyPy somehow bypasses C and directly compiles into assembly language instructions. If that is the case, then it is fine.

Can someone explain to me how PyPy works? I need a simple answer.

I love python and want to start contributing. PyPy looks like an awesome place to start irrespective of whether they pull my code or not. But I am not able to understand from the brief research I have done.

Upvotes: 6

Answers (4)

Lewis Diamond

Reputation: 24911

Pypy has JIT (Just In Time) compilation. JIT compilation can make optimizations while running (because it's not precompiled).

The code will not be compiled to Assembly or C from the beginning. It's interpreted code (runs in the Pypy interpreter). The interpreter then can do the compilation "Just-in-time".

http://en.wikipedia.org/wiki/Just-in-time_compilation

http://en.wikipedia.org/wiki/Interpreted_language

Upvotes: 2

Ben

Reputation: 71420

The easiest way to understand PyPy is to forget that it's implemented in Python.

It actually isn't, anyway, it's implemented in RPython. RPython is runnable with a Python interpreter, but Python code is not able to be compiled by the RPython compiler (the PyPy translation framework). RPython is a subset of Python, but the parts that are "left out" are substantive enough that programming in RPython is very different from programming normally in Python.

So since Python code can't be treated as RPython code, and idiomatic RPython programs "look and feel" very different to idiomatic Python programs, lets ignore the connection between altogether, and consider a made-up example.

Pretend I've developed a new language, Frobble, with a compiler. And I have written a Python interpreter in Frobble. I claim that my "FrobblePython" interpreter is often substantially faster than the CPython interpreter.

Does this strike you as weird or impossible? Of course not. A new Python interpreter can be either faster or slower than the CPython interpreter (or more likely, faster at some things and slower at others, by varying margins). Whether it's faster or not will depend upon the implementation of FrobblePython, as well as the performance characteristics of code compiled by my Frobble compiler.

That's exactly how you should think about the PyPy interpreter. The fact that the language used to implement it, RPython, happens to be able to be interpreted by a Python interpreter (with the same external results as compiling the RPython program and running it) is completely irrelevant to understanding how fast it is. All that matters is the implementation of the PyPy interpreter, and the performance characteristics of code compiled by the the RPython compiler (such as the fact that the RPython compiler can automatically add certain kinds of JITing capability to the programs it compiles).

Upvotes: 15

fijal

Reputation: 3190

PyPy is itself written in RPython, which is a restricted subset of Python. While you can run it on top of CPython, it's very slow, so instead you translate this RPython into C, hence bypassing the interpretation. This, in theory, can already be faster than CPython, but is actually quite a bit slower. On top of that there is implemented a just in time compiler (also in RPython), that compiles Python to assembler.

In short, there is no actual double-interpretation involved at any point in time during runtime, so there is no issue.

Upvotes: 2

user395760

Reputation:

The "it has a JIT" answer is technically correct but insufficient. PyPy run as Python code, by a Python interpreter, can JIT compile the Python code it interpretes (in fact, the JIT tests are often run this way) but is still awfully slow (it can take minutes to just start interpreting).

The missing piece, which predates the JIT and is actually required for the JIT, is writing the interpreter in a restricted subset of Python (called RPython) and then compiling it to C code. This way, you get an program which runs at roughly the level of abstraction of C (despite being written as a higher level of abstraction). This interpreter has historically been, and AFAIK still is, somewhat slower than CPython, but not several orders of magnitude slower (as an interpreted interpreter would be).

Your comment about "compiling directly to assembly" betrays confusion. Assembly code is not automatically faster than C code -- in fact you'd be hard-pressed to beat today's C compilers at generating assembly code, and C code is much easier to write and/or generate, even without getting into the whole portability mess. The problem isn't turning Python into C or assembly (take a look at Nuitka), the problem is paraphrasing the program in a more efficient manner without affecting semantics. Going straight for assembly does not solve any of the hard problems with that, makes the comparatively easy problem of generating code for the more efficient program harder, and very rarely allows any optimizations you can't also express in C.

Now, PyPy's JIT does generate machine code, but the PyPy executable is compiled from C code by a C compiler. The PyPy developers would be idiots if they attempted to compete with existing C compilers on even a single platform, let alone multiple platforms. Luckily, they aren't idiots and know that. The reasons for letting the JIT generate assembly code are different and much better (for starters, in the context of a JIT there are several optimizations you can't do in C).

By the way, most of what I wrote above is also stated in the answers to the question you link to.

Upvotes: 7

How can PyPy be faster than Cpython

Answers (4)

Related Questions