RyeGrain
RyeGrain

Reputation: 71

Can I read non-code files in a Python Zip archive?

I have a Python application in a directory dir. This directory has a __main__.py file and several data files that are read by the application using open(...,'r'). Without editing the code, it it possible to bundle the code and data files into a single zip file and execute it using something like python app.pyz

Can I bundle code and data into one file?

Upvotes: 1

Views: 603

Answers (1)

Dennis
Dennis

Reputation: 91

As of Python 3.9 you can use importlib.resources from the standard library. This module uses Python's import machinery to resolve the paths of data files as though they were modules inside a package.

  • Create a new package inside dir. Let's call it data. Make sure it has an __init__.py.

  • Add your data files to data. Let's say you added a text file text.txt and a binary file binary.dat.

Now from your __main__.py script or any part of your code with access to the module data, you can access files inside that package like so:

  • To read text.txt to memory as a string:
txt_file = importlib.resources.files("data").joinpath("text.txt").read_text(encoding="utf-8")
  • To read binary.dat to memory as bytes:
bin_file = importlib.resources.files("data").joinpath("binary.dat").read_bytes()
  • To open any file:
path = importlib.resources.files("data").joinpath("text.txt")
with path.open("rt", encoding="utf-8") as file:
    lines = file.readlines()
# As streams:
textio_stream = importlib.resources.files("data").joinpath("text.txt").open("rt", encoding="utf-8")
bytesio_stream = importlib.resources.files("data").joinpath("binary.dat").open("rb")
  • If something requires an actual real file on the filesystem, or you simply want to wrap zipapp compatibility over existing code (e.g. with open()) without having to modify it:
# Old, incompatible with zipfiles.
file_path = "data/text.txt"
with open(file_path, "rt", encoding="utf-8") as file:
    lines = file.readlines()
# New, compatible with zipfiles.
file_path = importlib.resources.files("data").joinpath("text.txt")

# If file is inside a zipfile, unzips it in a temporary file, then
# destroys it once the context manager closes. Otherwise, reads the file normally.
with importlib.resources.as_file(file_path) as path:
    with open(path, "rt", encoding="utf-8") as file:
        lines = file.readlines()
# Since it is a context manager, you can even store it like this:
file_path = importlib.resources.files("data").joinpath("text.txt")
real_path = importlib.resources.as_file(file_path)

with real_path as path:
    with open(path, "rt", encoding="utf-8") as file:
        lines = file.readlines()

The Traversable objects returned from importlib.resources functions can be mixed with Path objects using as_posix, since joinpath requires posix separators:

file_path = pathlib.Path("subdirectory", "text.txt")
txt_file = importlib.resources.files("data").joinpath(file_path.as_posix()).read_text(encoding="utf-8")

You can use slashes to grow a Traversable, just like pathlib.Path objects:

resources_root = importlib.resources.files("data")
text_path = resources_root / "text.txt"
bin_file = (resources_root / "subdirectory" / "bin.dat").read_bytes()

You can also import the data package like any other package, and use the module object directly. Subpackages are also supported. The only Python files inside the data tree are the __init__.py files of each subpackage:

# __main__.py
import importlib.resources
import data.config
import data.models.b

# Load binary file `file.dat` from `data.models.b`.
# Subpackages are being used as subdirectories.
bin_file = importlib.resources.files(data.models.b).joinpath("file.dat").read_bytes()
...

You technically only need to make your resource root directory be a package. For max brevity:

# __main__.py
from importlib.resources import files
data = files("data")  # Resources root.

# In this example, `models` and `b` are regular directories:
bin_file = (data / "models" / "b" / "file.dat").read_bytes()
...

Note that importlib.resources and zipfiles in general support reading only and you will get an exception if you try to write to any file-like object returned from the above functions. It might technically be possible to support modifying data files inside zips but this is way out of scope. If you want to write files, just open a file in the filesystem as normal.

Now your data files have become file-system agnostic and your program should work via zipapp and normal invocation just the same.

Upvotes: 3

Related Questions