Aaron Schif
Aaron Schif

Reputation: 2442

When are .pyc files refreshed?

I understand that ".pyc" files are compiled versions of the plain-text ".py" files, created at runtime to make programs run faster. However I have observed a few things:

  1. Upon modification of "py" files, program behavior changes. This indicates that the "py" files are compiled or at least go though some sort of hashing process or compare time stamps in order to tell whether or not they should be re-compiled.
  2. Upon deleting all ".pyc" files (rm *.pyc) sometimes program behavior will change. Which would indicate that they are not being compiled on update of ".py"s.

Questions:

Upvotes: 103

Views: 60332

Answers (3)

Bowie Owens
Bowie Owens

Reputation: 2986

For Python 3 the relevant documentation is at:

https://docs.python.org/3/reference/import.html#pyc-invalidation

It presently states:

5.4.7. Cached bytecode invalidation

Before Python loads cached bytecode from a .pyc file, it checks whether the cache is up-to-date with the source .py file. By default, Python does this by storing the source’s last-modified timestamp and size in the cache file when writing it. At runtime, the import system then validates the cache file by checking the stored metadata in the cache file against the source’s metadata.

Python also supports “hash-based” cache files, which store a hash of the source file’s contents rather than its metadata. There are two variants of hash-based .pyc files: checked and unchecked. For checked hash-based .pyc files, Python validates the cache file by hashing the source file and comparing the resulting hash with the hash in the cache file. If a checked hash-based cache file is found to be invalid, Python regenerates it and writes a new checked hash-based cache file. For unchecked hash-based .pyc files, Python simply assumes the cache file is valid if it exists. Hash-based .pyc files validation behavior may be overridden with the --check-hash-based-pycs flag.

Changed in version 3.7: Added hash-based .pyc files. Previously, Python only supported timestamp-based invalidation of bytecode caches.

Upvotes: 0

DaveTheScientist
DaveTheScientist

Reputation: 3399

The .pyc files are created (and possibly overwritten) only when that python file is imported by some other script. If the import is called, Python checks to see if the .pyc file's internal timestamp is not older than the corresponding .py file. If it is, it loads the .pyc; if it isn't or if the .pyc does not yet exist, Python compiles the .py file into a .pyc and loads it.

What do you mean by "stricter checking"?

Upvotes: 93

Zags
Zags

Reputation: 41258

.pyc files generated whenever the corresponding code elements are imported, and updated if the corresponding code files have been updated. If the .pyc files are deleted, they will be automatically regenerated. However, they are not automatically deleted when the corresponding code files are deleted.

This can cause some really fun bugs during file-level refactors.

First of all, you can end up pushing code that only works on your machine and on no one else's. If you have dangling references to files you deleted, these will still work locally if you don't manually delete the relevant .pyc files because .pyc files can be used in imports. This is compounded with the fact that a properly configured version control system will only push .py files to the central repository, not .pyc files, meaning that your code can pass the "import test" (does everything import okay) just fine and not work on anyone else's computer.

Second, you can have some pretty terrible bugs if you turn packages into modules. When you convert a package (a folder with an __init__.py file) into a module (a .py file), the .pyc files that once represented that package remain. In particular, the __init__.pyc remains. So, if you have the package foo with some code that doesn't matter, then later delete that package and create a file foo.py with some function def bar(): pass and run:

from foo import bar

you get:

ImportError: cannot import name bar

because python is still using the old .pyc files from the foo package, none of which define bar. This can be especially problematic on a web server, where totally functioning code can break because of .pyc files.

As a result of both of these reasons (and possibly others), your deployment code and testing code should delete .pyc files, such as with the following line of bash:

find . -name '*.pyc' -delete

Also, as of python 2.6, you can run python with the -B flag to not use .pyc files. See How to avoid .pyc files? for more details.

See also: How do I remove all .pyc files from a project?

Upvotes: 37

Related Questions