Reputation: 5286
Hi I am working with python packaging. I have 3 non-code
files namely ['synonyms.csv', 'acronyms.csv', 'words.txt']
.
Wordproject/WordProject/Repository/DataBank/
RepositoryReader
class at the path Wordproject/WordProject/Repository/
RepositoryReader
and then looks for a subdirectory
called DataBank
and looks for the 3 files there.The problem is when I create an egg
out of the code, and then run it,
My code gives me the error:
Could not find the file at X:\1. Projects\Python\Wordproject\venv\lib\site-packages\Wordproject-1.0-py3.6.egg\Wordproject\Repository\DataBank\synonyms.csv
It's not able to fetch the file or read it from the path if the path is of an egg. Is there any way around it? These files have to be in an egg
.
Upvotes: 8
Views: 7387
Reputation: 4344
If you're using Python 3.7 or later, I suggest using importlib_resources. From their doc https://importlib-resources.readthedocs.io/en/latest/using.html here's an example of getting a YAML file tucked into a module:
from importlib_resources import files, as_file
yaml_path = files('my-module').joinpath('openapi.yml')
with as_file(yaml_path) as yaml:
conn_app.add_api(yaml)
This works if the module is installed in a directory via pip3 install .
and also if installed as an egg (zip) file via python3 setup.py install
Upvotes: 0
Reputation: 1049
Based on the documentation, We can read the contents of file in multiple ways.
Solution 1: Read the contents of file directly into the memory.
Without extracting the file locally.
import zipfile, tempfile
tfile = tempfile.NamedTemporaryFile()
with zipfile.ZipFile('/path/to/egg.egg') as myzip:
with myzip.open('relative/path/to/file.txt') as myfile:
tfile.write(myfile.read())
# .. do something with temporary file
tfile.close()
Now tfile
is your local temporary file handle. It's name is tfile.name
and all file operations such as open(tfile)
etc. work as usual on this. tfile.close()
must be called at the end to close the handle.
Contents of file can be read by myfile.read()
itself but we lose myfile handle as soon as we exit the context. So contents of file are copied into a temporary file if it needs to be passed around for other operations.
Solution 2 : Extract the member of egg locally
zipfile provides an API for extracting the specific member
import zipfile
x = zipfile.ZipFile('/path/to/egg.egg')
x.extractall(path='temp/dest/folder', members=['path/to/file.txt'])
Solution 3 : Extract the whole egg
Another solution is to extract the egg in temporary folder and then read the file. Egg can be extracted on command line as following
python -m zipfile -e path/to/my.egg ./temp_destination
Upvotes: 0
Reputation: 37539
egg
files are just renamed .zip files.
You can use the zipfile
library to open the egg and extract or read the file you need.
import zipfile
zip = zipfile.ZipFile('/path/to/file.egg', 'r')
# open file from within the egg
f = zip.open('synonyms.csv', 'r')
txt = f.read()
Upvotes: 4
Reputation: 365797
There are two different things you could be trying to do here:
pip install
time, to a location you can access normally.Both are explained in the section on data files in the PyPA/setuptools
docs. I think you want the first one here, which is covered in the subsection on Accessing Data Files at Runtime:
Typically, existing programs manipulate a package’s
__file__
attribute in order to find the location of data files. However, this manipulation isn’t compatible with PEP 302-based import hooks, including importing from zip files and Python Eggs. It is strongly recommended that, if you are using data files, you should use the ResourceManager API ofpkg_resources
to access them. Thepkg_resources
module is distributed as part ofsetuptools
, so if you’re usingsetuptools
to distribute your package, there is no reason not to use its resource management API. See also Accessing Package Resources for a quick example of converting code that uses__file__
to usepkg_resources
instead.
Follow that link, and you find what look like some crufty old PEAK docs, but that's only because they really are crufty old PEAK docs. There is a version buried inside the setuptools
docs that you may find easier to read and navigate once you manage to find it.
As it says, you could try
using get_data
(which will work inside an egg/zip) and then fall back to accessing a file (which will work when running from source), but you're better off using the wrappers in pkg_resources
. Basically, if your code was doing this:
path = os.path.join(__file__, 'Wordproject/WordProject/Repository/DataBank/', datathingy)
with open(path) as f:
for line in f:
do_stuff(line)
… you'll change it to this:
path = 'Wordproject/WordProject/Repository/DataBank/' + datathingy
f = pkg_resources.resource_stream(__name__, path)
for line in f:
do_stuff(line.decode())
Notice that resource_stream
files are always opened in binary mode. So if you want to read them as text, you need to wrap a TextIOWrapper
around them, or decode each line.
Upvotes: 1