Josh Flori
Josh Flori

Reputation: 425

Where does spacy language model download?

I have a simple command:

python -m spacy download en_core_web

And I cannot for the life of me figure out where it downloads. I search for "en_core_web" but can find absolutely nothing, anywhere. And I can't for the life of me figure out what to search to understand the syntax behind this command.

What do you even call this line? A python command line argument? I couldn't find what to search for to specify a download location.

Please help!

Upvotes: 37

Views: 21266

Answers (4)

Matt
Matt

Reputation: 509

I stumbled across the same question and the model path can be found using the model class variable to a loaded spacy model.

For instance, having completed the model download at the command line as follows:
python -m spacy download en_core_web_sm

then within the python shell:

import spacy
model = spacy.load("en_core_web_sm")
model._path

This will show you where the model has been installed in your system.

If you want to download to a different location, I believe you can write the following at the command line:
python -m spacy.en.download en_core_web_sm --data-path /some/dir

Hope that helps

Upvotes: 31

Rozinig
Rozinig

Reputation: 1

After having an issue with finding the download directory what I found was that at least on Linux the initial download before being installed to the python library seems to controlled by pip and the python -m spacy.download ... command takes pip options. So --cache-dir <dir> changes the download location and in my case the --no-cache-dir fixed my lack of space in the default download location issue. I believe the actual installed location will be with your python libraries where ever those are depending on distribution and virtual environments and the such.

Upvotes: 0

loretoparisi
loretoparisi

Reputation: 16309

Putting all together the solutions proposed above, to control spacy download location, the following approach is possibile:

cache_dir=os.getenv("cache_dir", "../../models")
model_path="en_core_web_sm"
try:
    nlp = spacy.load(os.path.join(cache_dir,model_path))
except OSError:
    spacy.cli.download(model_path)
    nlp = spacy.load(model_path)
    nlp.to_disk(os.path.join(cache_dir,model_path))

In this way, starting from the second execution, the model will be available under requested location

nlp = spacy.load(os.path.join(cache_dir,model_path))

Upvotes: 10

JonR
JonR

Reputation: 196

I can't seem to find any evidence that spacy pays attention to the $SPACY_DATA_DIR environment variable, nor can I get the above --data-path or model.path (--model.path?) parameters to work when trying to download models to a particular place. For me this was an issue as I was trying to keep the models out of a Docker image so that they could be persisted or be updated easily without rebuilding the image.

I eventually came to the following solution for using pre-trained models:

  1. Run the download code as normal (i.e. python -m spacy.download en_core_web_lg)
  2. In Python: import spacy and then nlp = spacy.load('en_core_web_lg')
  3. Now save this to the place you want it: nlp.to_disk('path/to/dir')

You can now load this from the local file via nlp=spacy.load('path/to/dir'). There's a suggestion in the documentation that you can download the models manually:

You can place the model data directory anywhere on your local file system. To use it with spaCy, simply assign it a name by creating a shortcut link for the data directory. But I can't make sense of what this means in practice (have submitted an 'issue' to spaCy).

Hope this helps anyone else trying to do something similar.

Upvotes: 12

Related Questions