Reputation: 425
I have a simple command:
python -m spacy download en_core_web
And I cannot for the life of me figure out where it downloads. I search for "en_core_web" but can find absolutely nothing, anywhere. And I can't for the life of me figure out what to search to understand the syntax behind this command.
What do you even call this line? A python command line argument? I couldn't find what to search for to specify a download location.
Please help!
Upvotes: 37
Views: 21266
Reputation: 509
I stumbled across the same question and the model path can be found using the model class variable to a loaded spacy model.
For instance, having completed the model download at the command line as follows:
python -m spacy download en_core_web_sm
then within the python shell:
import spacy
model = spacy.load("en_core_web_sm")
model._path
This will show you where the model has been installed in your system.
If you want to download to a different location, I believe you can write the following at the command line:
python -m spacy.en.download en_core_web_sm --data-path /some/dir
Hope that helps
Upvotes: 31
Reputation: 1
After having an issue with finding the download directory what I found was that at least on Linux the initial download before being installed to the python library seems to controlled by pip and the python -m spacy.download ...
command takes pip options.
So --cache-dir <dir>
changes the download location and in my case the --no-cache-dir
fixed my lack of space in the default download location issue. I believe the actual installed location will be with your python libraries where ever those are depending on distribution and virtual environments and the such.
Upvotes: 0
Reputation: 16309
Putting all together the solutions proposed above, to control spacy
download location, the following approach is possibile:
cache_dir=os.getenv("cache_dir", "../../models")
model_path="en_core_web_sm"
try:
nlp = spacy.load(os.path.join(cache_dir,model_path))
except OSError:
spacy.cli.download(model_path)
nlp = spacy.load(model_path)
nlp.to_disk(os.path.join(cache_dir,model_path))
In this way, starting from the second execution, the model will be available under requested location
nlp = spacy.load(os.path.join(cache_dir,model_path))
Upvotes: 10
Reputation: 196
I can't seem to find any evidence that spacy pays attention to the $SPACY_DATA_DIR
environment variable, nor can I get the above --data-path
or model.path
(--model.path
?) parameters to work when trying to download models to a particular place. For me this was an issue as I was trying to keep the models out of a Docker image so that they could be persisted or be updated easily without rebuilding the image.
I eventually came to the following solution for using pre-trained models:
python -m spacy.download en_core_web_lg
)import spacy
and then nlp = spacy.load('en_core_web_lg')
nlp.to_disk('path/to/dir')
You can now load this from the local file via nlp=spacy.load('path/to/dir')
. There's a suggestion in the documentation that you can download the models manually:
You can place the model data directory anywhere on your local file system. To use it with spaCy, simply assign it a name by creating a shortcut link for the data directory. But I can't make sense of what this means in practice (have submitted an 'issue' to spaCy).
Hope this helps anyone else trying to do something similar.
Upvotes: 12