Reputation: 3317
How can I read a file that is inside my Python package?
A package that I load has a number of templates (text files used as strings) that I want to load from within the program. But how do I specify the path to such file?
Imagine I want to read a file from:
mypackage\templates\temp_file
Some kind of path manipulation? Package base path tracking?
Upvotes: 259
Views: 158823
Reputation: 363304
Before you can even worry about reading resource files, the first step is to make sure that the data files are getting packaged into your distribution in the first place - it is easy to read them directly from the source tree, but the important part is making sure these resource files are accessible from code within an installed package.
Structure your project like this, putting data files into a subdirectory within the package:
. <--- project root
├── mypackage <--- source root
│ ├── __init__.py
│ ├── templates <--- resources subdirectory
│ │ └── temp_file <--- this is a data file, not code
│ ├── mymodule1.py
│ └── mymodule2.py
├── README.rst
├── MANIFEST.in
└── setup.py
You should pass include_package_data=True
in the setup()
call. The manifest file is only needed if you want to use setuptools/distutils and build source distributions. To make sure the templates/temp_file
gets packaged for this example project structure, add a line like this into the manifest file:
recursive-include package *
Historical cruft note: Using a manifest file is not needed for modern build backends such as flit, poetry, which will include the package data files by default. So, if you're using pyproject.toml
and you don't have a setup.py
file then you can ignore all the stuff about MANIFEST.in
.
Now, with packaging out of the way, onto the reading part...
Use importlib.resources.files
, it returns a traversible for accessing resources with usage similar to pathlib
:
import importlib_resources
my_resources = importlib_resources.files("mypackage")
data = my_resources.joinpath("templates", "temp_file").read_bytes()
This works on Python 2 and 3, it works in zips, and it doesn't require spurious __init__.py
files to be added in resource subdirectories.
Python 3.9+ is required. For older Python versions, there is a backport available to install which supports the same APIs.
This was previously described in the accepted answer. At best, it looks something like this:
from pathlib import Path
resource_path = Path(__file__).parent / "templates"
data = resource_path.joinpath("temp_file").read_bytes()
What's wrong with that? The assumption that you have files and subdirectories available is not correct. This approach doesn't work if executing code which is packed in a zip or a wheel, and it may be entirely out of the user's control whether or not your package gets extracted to a filesystem at all.
This is described in the top-voted answer. It looks something like this:
from pkg_resources import resource_string
data = resource_string(__name__, "templates/temp_file")
What's wrong with that? It adds a runtime dependency on setuptools, which should preferably be an install time dependency only. Importing and using pkg_resources
can become really slow, as the code builds up a working set of all installed packages, even though you were only interested in your own package resources. That's not a big deal at install time (since installation is once-off), but it's ugly at runtime.
This is currently was previously the recommendation of the top-voted answer. It's in the standard library since Python 3.7. It looks like this:
from importlib.resources import read_binary
data = read_binary("mypackage.templates", "temp_file")
What's wrong with that? Well, unfortunately, the implementation left some things to be desired and it is likely to be was deprecated in Python 3.11. Using importlib.resources.read_binary
, importlib.resources.read_text
and friends will require you to add an empty file templates/__init__.py
so that data files reside within a sub-package rather than in a subdirectory. It will also expose the mypackage/templates
subdirectory as an importable mypackage.templates
sub-package in its own right. This won't work with many existing packages which are already published using resource subdirectories instead of resource sub-packages, and it's inconvenient to add the __init__.py
files everywhere muddying the boundary between data and code.
This approach was deprecated in upstream importlib_resources
in 2021, and was deprecated in stdlib from version Python 3.11. bpo-45514 tracked the deprecation and migrating from legacy offers _legacy.py
wrappers to aid with transition.
Even more confusingly, the functional APIs may become "undeprecated" again in Python 3.13, and the names remain the same but the usage is subtly different: read_binary
, read_text
.
Long before importlib.resources
existed, there was a standard library pkgutil
module for accessing resources. It actually still works fine! It looks like this in library code
# within mypackage/mymodule1.py, for example
import pkgutil
data = pkgutil.get_data(__name__, "templates/temp_file")
It works in zips. It works on Python 2 and Python 3. It doesn't require any third-party dependencies. It's probably more battle-tested than importlib.resources
, and might even be a better choice if you need to support a wide range of Python versions.
I've created an example project on GitHub and uploaded on PyPI, which demonstrates all five approaches discussed above. Try it out with:
$ pip install resources-example
$ resources-example
See https://github.com/wimglenn/resources-example for more info.
Upvotes: 242
Reputation: 11776
This is my standard way of doing it
import importlib.resources as resources
from <your_package> import __name__ as pkg_name
template_path = resources.files(pkg_name) / "template" / "temp_file"
with template_path.open() as f:
template = f.read()
On a side note and inspired by the Maven standard dir layout I recommend the following project structure with a resources
folder inside the package directory and the tests directory:
.
├── pyproject.toml
├── src
│ └── <your_package>
│ └── resources
└── tests
└── resources
Then your temp_file
would go in the resources
folder and you would access the file like
template_path = resources.files(pkg_name) / "resources" / "temp_file"
Upvotes: 2
Reputation: 9493
importlib.resources
moduleIf you don't care for backward compatibility < Python 3.9 (explained in detailed in method no 2, below) use this:
from importlib import resources as impresources
from . import templates
inp_file = impresources.files(templates) / 'temp_file'
with inp_file.open("rt") as f:
template = f.read()
The traditional pkg_resources
from setuptools
is not recommended anymore because the new method:
setuptools
).I kept the traditional listed first, to explain the differences with the new method when porting existing code (porting also explained here).
Let's assume your templates are located in a folder nested inside your module's package:
<your-package>
+--<module-asking-the-file>
+--templates/
+--temp_file <-- We want this file.
Note 1: For sure, we should NOT fiddle with the
__file__
attribute (e.g. code will break when served from a zip).Note 2: If you are building this package, remember to declare your data files as
package_data
ordata_files
in yoursetup.py
.
pkg_resources
from setuptools
(slow)You may use pkg_resources
package from setuptools distribution, but that comes with a cost, performance-wise:
import pkg_resources
# Could be any dot-separated package/module name or a "Requirement"
resource_package = __name__
resource_path = '/'.join(('templates', 'temp_file')) # Do not use os.path.join()
template = pkg_resources.resource_string(resource_package, resource_path)
# or for a file-like stream:
template = pkg_resources.resource_stream(resource_package, resource_path)
Tips:
This will read data even if your distribution is zipped, so you may set
zip_safe=True
in yoursetup.py
, and/or use the long-awaitedzipapp
packer from python-3.5 to create self-contained distributions.Remember to add
setuptools
into your run-time requirements (e.g. in install_requires`).
... and notice that according to the Setuptools/pkg_resources
docs, you should not use os.path.join
:
Basic Resource Access
Note that resource names must be
/
-separated paths and cannot be absolute (i.e. no leading/
) or contain relative names like "..
". Do not useos.path
routines to manipulate resource paths, as they are not filesystem paths.
importlib_resources
libraryUse the standard library's importlib.resources
module which is more efficient than setuptools
, above:
try:
from importlib import resources as impresources
except ImportError:
# Try backported to PY<37 `importlib_resources`.
import importlib_resources as impresources
from . import templates # relative-import the *package* containing the templates
try:
inp_file = (impresources.files(templates) / 'temp_file')
with inp_file.open("rb") as f: # or "rt" as text file with universal newlines
template = f.read()
except AttributeError:
# Python < PY3.9, fall back to method deprecated in PY3.11.
template = impresources.read_text(templates, 'temp_file')
# or for a file-like stream:
template = impresources.open_text(templates, 'temp_file')
Attention:
Regarding the function
read_text(package, resource)
:
- The
package
can be either a string or a module.- The
resource
is NOT a path anymore, but just the filename of the resource to open, within an existing package; it may not contain path separators and it may not have sub-resources (i.e. it cannot be a directory).
For the example asked in the question, we must now:
<your_package>/templates/
into a proper package, by creating an empty __init__.py
file in it,import
statement (no more parsing package/module names),resource_name = "temp_file"
(no path).Tips:
- To access a file inside your current module, set the package argument to
__package__
, e.g.impresources.read_text(__package__, 'temp_file')
(thanks to @ben-mares).- Things become interesting when an actual filename is asked with
path()
, since now context-managers are used for temporarily-created files (read this).- Add the backported library, conditionally for older Pythons, with
install_requires=[" importlib_resources ; python_version<'3.7'"]
(check this if you package your project withsetuptools<36.2.1
).- Remember to remove
setuptools
library from your runtime-requirements, if you migrated from the traditional method.- Remember to customize
setup.py
orMANIFEST
to include any static files.- You may also set
zip_safe=True
in yoursetup.py
.
Upvotes: 311
Reputation: 5653
The accepted answer should be to use importlib.resources
. pkgutil.get_data
also requires the argument package
be a non-namespace package (see pkgutil docs). Hence, the directory containing the resource must have an __init__.py
file, making it have the exact same limitations as importlib.resources
. If the overhead issue of pkg_resources
is not a concern, this is also an acceptable alternative.
Pre-Python-3.3
, all packages were required to have an __init__.py
. Post-Python-3.3
, a folder doesn't need an __init__.py
to be a package. This is called a namespace package
. Unfortunately, pkgutil
does not work with namespace packages
(see pkgutil docs).
For example, with the package structure:
+-- foo/
| +-- __init__.py
| +-- bar/
| | +-- hi.txt
where hi.txt
just has Hi!
, you get the following
>>> import pkgutil
>>> rsrc = pkgutil.get_data("foo.bar", "hi.txt")
>>> print(rsrc)
None
However, with an __init__.py
in bar
, you get
>>> import pkgutil
>>> rsrc = pkgutil.get_data("foo.bar", "hi.txt")
>>> print(rsrc)
b'Hi!'
Upvotes: 0
Reputation: 136665
In case you have this structure
lidtk
├── bin
│ └── lidtk
├── lidtk
│ ├── analysis
│ │ ├── char_distribution.py
│ │ └── create_cm.py
│ ├── classifiers
│ │ ├── char_dist_metric_train_test.py
│ │ ├── char_features.py
│ │ ├── cld2
│ │ │ ├── cld2_preds.txt
│ │ │ └── cld2wili.py
│ │ ├── get_cld2.py
│ │ ├── text_cat
│ │ │ ├── __init__.py
│ │ │ ├── README.md <---------- say you want to get this
│ │ │ └── textcat_ngram.py
│ │ └── tfidf_features.py
│ ├── data
│ │ ├── __init__.py
│ │ ├── create_ml_dataset.py
│ │ ├── download_documents.py
│ │ ├── language_utils.py
│ │ ├── pickle_to_txt.py
│ │ └── wili.py
│ ├── __init__.py
│ ├── get_predictions.py
│ ├── languages.csv
│ └── utils.py
├── README.md
├── setup.cfg
└── setup.py
you need this code:
import pkg_resources
# __name__ in case you're within the package
# - otherwise it would be 'lidtk' in this example as it is the package name
path = 'classifiers/text_cat/README.md' # always use slash
filepath = pkg_resources.resource_filename(__name__, path)
The strange "always use slash" part comes from setuptools
APIs
Also notice that if you use paths, you must use a forward slash (/) as the path separator, even if you are on Windows. Setuptools automatically converts slashes to appropriate platform-specific separators at build time
In case you wonder where the documentation is:
Upvotes: 18
Reputation: 2332
The content in "10.8. Reading Datafiles Within a Package" of Python Cookbook, Third Edition by David Beazley and Brian K. Jones giving the answers.
I'll just get it to here:
Suppose you have a package with files organized as follows:
mypackage/
__init__.py
somedata.dat
spam.py
Now suppose the file spam.py wants to read the contents of the file somedata.dat. To do it, use the following code:
import pkgutil
data = pkgutil.get_data(__package__, 'somedata.dat')
The resulting variable data will be a byte string containing the raw contents of the file.
The first argument to get_data() is a string containing the package name. You can
either supply it directly or use a special variable, such as __package__
. The second
argument is the relative name of the file within the package. If necessary, you can navigate
into different directories using standard Unix filename conventions as long as the
final directory is still located within the package.
In this way, the package can installed as directory, .zip or .egg.
Upvotes: 20
Reputation: 2422
assuming you are using an egg file; not extracted:
I "solved" this in a recent project, by using a postinstall script, that extracts my templates from the egg (zip file) to the proper directory in the filesystem. It was the quickest, most reliable solution I found, since working with __path__[0]
can go wrong sometimes (i don't recall the name, but i cam across at least one library, that added something in front of that list!).
Also egg files are usually extracted on the fly to a temporary location called the "egg cache". You can change that location using an environment variable, either before starting your script or even later, eg.
os.environ['PYTHON_EGG_CACHE'] = path
However there is pkg_resources that might do the job properly.
Upvotes: -3