How does the python interpreter know when to compile and update a .pyc file?

Question

I knew that a .pyc file is generated by the python interpreter and contains the byte code as this question said.

I thought python interpreter is using the time stamp to detect whether a .pyc is newer than a .py, and if it is, skipped compiling it again when executing. (The way what makefile do)

So, I did a test, but it seemed I was wrong.

I wrote t.py contains print '123' and t1.py contains import t. Running command python t1.py gave the output 123 and generated t.pyc, all as expected.
Then I edited t.py as print '1234' and updated the time stamp of t.pyc by using touch t.pyc.
Run python t1.py again, I thought I would get 123 but 1234 indeed. So it seemed the python interpreter still knew that t.py is updated.

Then I wondered whether python interpreter will compile and generate t.pyc every time running python t1.py. But when I run python t1.py several times, I found that the t.pyc will not be updated when t.py is not updated.

So, my question is: how python interpreter knows when to compile and update a .pyc file?

Updated

Since python interpreter is using the timestamp stored in the .pyc file. I think it a record of when .pyc was last updated. And when imported, compare it with the timestamp of .py file.

So I tried to hack it in this way: change the OS time to an older one, and edit .py file. I thought when imported again, the .py seems older than the .pyc, and the python interpreter will not update .pyc. But I was wrong again.

So, does the python interpreter compare these two timestamp not in a older or newer way but in a exactly equal way?

In a exectly equal way, I means the timestamp in .pyc records the when the .py was last modified. When imported, it compares the timestamp with the current timestamp of .py, if it's not the same, recompile and update .pyc.

svvac · Accepted Answer

It looks like the timestamp is stored directly in the *.pyc file. The python interpreter doesn't rely on the last modification attribute of the file, maybe to avoid incompatibe bytecode issues when copying source trees.

Looking at the python implementation of the import statement, you can find the stale check in _validate_bytecode_header(). By the looks of it, it extracts bytes 4 to 7 (incl) and compares it against the timecode of the source file. If those doesn't match, the bytecode is considered stalled and thus recompiled.

In the process, it also checks the length of the source file against the length of the source used to generate a given bytecode (stored in bytes 8 to 11).

In the python implementation, if one of those checks fails, the bytecode loader raises an ImportError catched by SourceLoader.get_code() that triggers a recompilation of the bytecode.

Note: That's how it's done in the python version of importlib. I guess there's no functionnal difference in the native version, but my C is a bit too rusty to dig into compiler code

How does the python interpreter know when to compile and update a .pyc file?

Answers (2)

Related Questions