Reputation: 10420
I am writing a python package with modules that need to open data files in a ./data/
subdirectory. Right now I have the paths to the files hardcoded into my classes and functions. I would like to write more robust code that can access the subdirectory regardless of where it is installed on the user's system.
I've tried a variety of methods, but so far I have had no luck. It seems that most of the "current directory" commands return the directory of the system's python interpreter, and not the directory of the module.
This seems like it ought to be a trivial, common problem. Yet I can't seem to figure it out. Part of the problem is that my data files are not .py
files, so I can't use import functions and the like.
Any suggestions?
Right now my package directory looks like:
/
__init__.py
module1.py
module2.py
data/
data.txt
I am trying to access data.txt
from module*.py
!
Upvotes: 158
Views: 82751
Reputation: 3764
Please note this answer is originally as of 2011; as far as I know it still works in 2025, but in 2025 the setuptools
docs recommend importlib.resources
instead now.
New: importlib
Docs: https://docs.python.org/3.11/library/importlib.resources.html#module-importlib.resources
from importlib_resources import files, as_file
source = files(my.package.foo.bar).joinpath('myfilename')
# NOTE: my.package is literally a package reference, not a string,
# but the filename is a string
with as_file(source) as myfile:
# do stuff with myfile
More info in usage guide: https://importlib-resources.readthedocs.io/en/latest/using.html#using-importlib-resources
Migration guide from setuptools
: https://importlib-resources.readthedocs.io/en/latest/migration.html
Old: setuptools
The standard way to do this is with setuptools packages and pkg_resources.
You can lay out your package according to the following hierarchy, and configure the package setup file to point it your data resources, as per this link:
https://docs.python.org/3.11/distutils/setupscript.html#installing-package-data
You can then re-find and use those files using pkg_resources, as per this link:
http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access
import pkg_resources
DATA_PATH = pkg_resources.resource_filename('<package name>', 'data/')
DB_FILE = pkg_resources.resource_filename('<package name>', 'data/sqlite.db')
Upvotes: 196
Reputation: 134038
There is often not point in making an answer that details code that does not work as is, but I believe this to be an exception. Python 3.7 added importlib.resources
that is supposed to replace pkg_resources
. It would work for accessing files within packages that do not have slashes in their names, i.e.
foo/
__init__.py
module1.py
module2.py
data/
data.txt
data2.txt
i.e. you could access data2.txt
inside package foo
with for example
importlib.resources.open_binary('foo', 'data2.txt')
but it would fail with an exception for
>>> importlib.resources.open_binary('foo', 'data/data.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/importlib/resources.py", line 87, in open_binary
resource = _normalize_path(resource)
File "/usr/lib/python3.7/importlib/resources.py", line 61, in _normalize_path
raise ValueError('{!r} must be only a file name'.format(path))
ValueError: 'data/data2.txt' must be only a file name
This cannot be fixed except by placing __init__.py
in data
and then using it as a package:
importlib.resources.open_binary('foo.data', 'data.txt')
The reason for this behaviour is "it is by design"; but the design might change...
Upvotes: 36
Reputation: 281795
You can use __file__
to get the path to the package, like this:
import os
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "data.txt")
print open(DATA_PATH).read()
Upvotes: 23
Reputation: 3329
To provide a solution working today. Definitely use this API to not reinvent all those wheels.
A true filesystem filename is needed. Zipped eggs will be extracted to a cache directory:
from pkg_resources import resource_filename, Requirement
path_to_vik_logo = resource_filename(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")
Return a readable file-like object for the specified resource; it may be an actual file, a StringIO, or some similar object. The stream is in “binary mode”, in the sense that whatever bytes are in the resource will be read as-is.
from pkg_resources import resource_stream, Requirement
vik_logo_as_stream = resource_stream(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")
Package Discovery and Resource Access using pkg_resources
Upvotes: 17
Reputation: 18169
You need a name for your whole module, you're given directory tree doesn't list that detail, for me this worked:
import pkg_resources
print(
pkg_resources.resource_filename(__name__, 'data/data.txt')
)
Notibly setuptools does not appear to resolve files based on a name match with packed data files, soo you're gunna have to include the data/
prefix pretty much no matter what. You can use os.path.join('data', 'data.txt)
if you need alternate directory separators, Generally I find no compatibility problems with hard-coded unix style directory separators though.
Upvotes: 8
Reputation: 10420
I think I hunted down an answer.
I make a module data_path.py, which I import into my other modules containing:
data_path = os.path.join(os.path.dirname(__file__),'data')
And then I open all my files with
open(os.path.join(data_path,'filename'), <param>)
Upvotes: 6