Reputation: 17040
I understand a few things based on the following link (I could be wrong!):
http://docs.python.org/2/glossary.html#term-bytecode
.pyc
is a cached file and is only generated if the module is imported somewhere else
.pyc
is to help loading performance, not execution performance.
running python foo.py
does not generate foo.pyc
unless foo
is imported somewhere.
Python has a bytecode compiler (used to generate .pyc
)
Python's virtual machine executes byte-code.
So, when I run python foo.py
, if foo.py
is not imported anywhere, does Python actually create an in-memory bytecode?
The missing .pyc
seems to break the idea of Python VM.
This question is extended to code execution in Python interpreter (running python
in the terminal). I believe CPython (or just about any language implementation) can't do pure interpretation.
I think the core of the question is: Does the VM actually read the .pyc
file? I assume VM loads the .pyc
into the execution environment.
Upvotes: 1
Views: 645
Reputation: 179462
Python is incapable of directly executing source code (unlike some other scripting languages which do ad hoc parsing, e.g. Bash). All Python source code must be compiled to bytecode, no matter what the source is. (This includes e.g. code run through eval
and exec
). Generating bytecode is rather expensive because it involves running a parser, so caching the bytecode (as .pyc) speeds up module loading by avoiding the parsing phase.
The difference between import foo
and python foo.py
is simply that the latter doesn't cache the bytecode that is generated.
Upvotes: 2
Reputation: 11130
Interesting ... the first thing I did was call for --help
$ python --help
usage: python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-B : don't write .py[co] files on import; also PYTHONDONTWRITEBYTECODE=x
...
and the first option I see is to disable automatic pyc and pyo file generation on import, though thats probably cause its alphabetical order.
lets run some tests
$ echo "print 'hello world'" > test.py
$ python test.py
hello world
$ ls test.py*
test.py
$ python -c "import test"
hello world
$ ls test.py*
test.py test.pyc
so it only generated the pyc file when it was imported.
now in order to check which files are being used I'll use OS X dtruss similar to linux truss to do a full trace ...
$ echo '#!/bin/sh
python test.py' > test.sh
$ chmod a+x test.sh
$ sudo dtruss -a ./test.sh 2>&1 | grep "test.py*"
975/0x5713: 244829 6 3 read(0x3, "#!/bin/sh \npython test.py\n\b\0", 0x50) = 26 0
975/0x5713: 244874 4 2 read(0xFF, "#!/bin/sh \npython test.py\n\b\0", 0x1A) = 26 0
977/0x5729: 658694 6 2 readlink("test.py\0", 0x7FFF5636E360, 0x400) = -1 Err#22
977/0x5729: 658726 10 6 getattrlist("/Users/samyvilar/test.py\0", 0x7FFF7C0EE510, 0x7FFF5636C6E0 = 0 0
977/0x5729: 658732 3 1 stat64("test.py\0", 0x7FFF5636DCB8, 0x0) = 0 0
977/0x5729: 658737 5 3 open_nocancel("test.py\0", 0x0, 0x1B6) = 3 0
977/0x5729: 658760 4 2 stat64("test.py\0", 0x7FFF5636E930, 0x1) = 0 0
977/0x5729: 658764 5 2 open_nocancel("test.py\0", 0x0, 0x1B6) = 3 0
from the looks of it python did not even touch test.pyc file at all!
$ echo '#!/bin/sh
python -c "import test"' > test.sh
$ chmod a+x test.sh
$ sudo dtruss -a ./test.sh 2>&1 | grep "test.py*"
$ sudo dtruss -a ./test.sh 2>&1 | grep "test.py*"
1028/0x5d74: 654642 8 5 open_nocancel("test.py\0", 0x0, 0x1B6) = 3 0
1028/0x5d74: 654683 8 5 open_nocancel("test.pyc\0", 0x0, 0x1B6) = 4 0
$
well thats interesting it looks like it opened test.py then test.pyc
what happens when we delete the pyc file.
$ rm test.pyc
$ sudo dtruss -a ./test.sh 2>&1 | grep "test.py*"
1058/0x5fd6: 654151 7 4 open_nocancel("/Users/samyvilar/test.py\0", 0x0, 0x1B6) = 3 0
1058/0x5fd6: 654191 6 3 open_nocancel("/Users/samyvilar/test.pyc\0", 0x0, 0x1B6) = -1 Err#2
1058/0x5fd6: 654234 7 3 unlink("/Users/samyvilar/test.pyc\0", 0x1012B456F, 0x1012B45E0) = -1 Err#2
1058/0x5fd6: 654400 171 163 open("/Users/samyvilar/test.pyc\0", 0xE01, 0x81A4) = 4 0
it first open test.py then it 'tried' to open test.pyc which returned an error then it called unlink and generated the pyc file again ... interesting, I thought it would check.
what if we delete the original py file.
$ sudo dtruss -a ./test.sh 2>&1 | grep "test.py*"
1107/0x670d: 655064 4 1 open_nocancel("test.py\0", 0x0, 0x1B6) = -1 Err#2
1107/0x670d: 655069 8 4 open_nocancel("test.pyc\0", 0x0, 0x1B6) = 3 0
no surprise there it couldn't open test.py but it still continued, to this day Im not sure if this is actually 'ok' python should give out some kind of warning, I've being burned a couple of times by this, accidentally deleting my files, running my tests and feeling a sigh of relief as they pass only to start sweating when I can't seem to find the source code!
After this tests we an assume python only uses pyc files either directly when invoked such as python test.pyc
or indirectly when imported, otherwise it doesn't seem to use them.
Supposedly CPythons compiler was designed to be fairly fast, it doesn't do much type checking and it probably generates very high level byte-code so most of the work load is actually done by the virtual machine ... it probably does a single pass, lexing->compiler->byte-code in one go, it does this every time, it reads a python file from the command line or when importing and no pyc file is present in that case it creates it.
this may be why some other implementations are faster since they take more time to compile but generate far rawer byte-codes that can be well optimized.
Its extremely difficult to build a virtual machine to do pure interpretive efficiently ...
Its all about balance, the more powerful your bytecode the simpler your compiler can be but the more complex and slow your virtual machine has to be and vice-versa ...
Upvotes: 1
Reputation: 12900
Your points 1 to 5 are correct, with the exception (if we're precise) of the point 4. The Python interpreter has a part called the bytecode compiler that turns source code into <code object at 0x...>
, which you can inspect by typing f.__code__
for any function f
. This is the real bytecode that is interpreted. These code objects may then, as a separate step, be saved inside .pyc
files.
Here are the operations in more details. The bytecode compiler runs only once per module, when you load the foo.py
and each of the modules it imports. It's not a too long operation, but it still takes some time, particularly if your module imports a lot of other modules. This is where .pyc
files enter the picture. After an import
statement has invoked the bytecode compiler, it tries to save the resulting <code object>
inside a .pyc
file. The next time, if the .pyc
file already exists and the .py
file has not been modified, the <code object>
is reloaded from there. This is just an optimization: it avoids the cost of invoking the bytecode compiler. In both cases the result is the same: a <code object>
was created in memory and is going to be interpreted.
It only works for import
statements, not for example for the main module (i.e. the foo.py
in the command line python foo.py
). The idea is that it should not really matter --- where the bytecode compiler would loose time in a typical medium-to-large program is in compiling all directly and indirectly imported modules, not just compiling foo.py
.
Upvotes: 5