Reputation: 1422
I use NLTK with wordnet in my project. I did the installation manually on my PC, with pip:
pip3 install nltk --user
in a terminal, then nltk.download()
in a python shell to download wordnet.
I want to automatize these with a setup.py
file, but I don't know a good way to install wordnet.
For the moment, I have this piece of code after the call to setup
("nltk"
is in the install_requires
list of the call to setup
):
import sys
if 'install' in sys.argv:
import nltk
nltk.download("wordnet")
Is there a better way to do this?
Upvotes: 16
Views: 4752
Reputation: 649
This setup worked for me:
from setuptools import setup, find_packages
from setuptools.command.install import install
class InstallCommand(install):
def run(self):
install.run(self)
import nltk
nltk.download('wordnet')
setup(
# other options...
install_requires=['nltk'],
setup_requires=['nltk'],
cmdclass={
'install': InstallCommand,
}
)
Upvotes: 0
Reputation: 602
As stated in this thread, external data should not be handled by setuptools in setup.py. As an alternative I suggest that in the __init__.py
file of your package you include the following lines (putting the case that you want to download the punkt
and stopwords
) :
__version__ = "x.x.x"
__organization__ = "your_organization"
import nltk
nltk.download("stopwords")
nltk.download("punkt")
This way the files will not be downloaded when the package is installed, but when it is imported (i.e. import my_package
).
As an example I share a link to a python library that does just this.
First you would have to install the library:
pip install -U pyleetspeak
And then importing the library will download the NLTK files:
import pyleetspeak
pyleetspeak.__version__
Upvotes: 1
Reputation: 11756
I managed to install the NLTK data in setup.py by overriding cmdclass
with my own Install
class :
from setuptools import setup, find_packages
from setuptools.command.install import install as _install
class Install(_install):
def run(self):
_install.do_egg_install(self)
import nltk
nltk.download("popular")
setup(...
cmdclass={'install': Install},
...
install_requires=[
'nltk',
],
setup_requires=['nltk']
...
)
It is important to use the method do_egg_install()
in your run()
method to make sure nltk gets installed, before import nltk
is called (See also here python setuptools install_requires is ignored when overriding cmdclass). Also don't forget to add nltk
to setup_requires
.
Upvotes: 14
Reputation: 381
You can also automate installation with a shell script, for example, running (after pip installing nltk):
python -m nltk.downloader -d /usr/share/nltk_data wordnet
Upvotes: 3