Reputation: 24099
I wanted to train a Spacy model on the Swedish UD Treebank.
To do this, I followed the instructions on the spacy page: https://spacy.io/usage/training#spacy-train-cli
The training itself runs, fine but at the end it is trying to open a file that for some reason doesn't exist. At least not at this location.
USER@Ubuntu18:~/spacy_models/sv$ python -m spacy train sv models talbanken-json/sv_talbanken-ud-train.json talbanken-json/sv_talbanken-ud-dev.json
⚠ Output directory is not empty
This can lead to unintended side effects when saving the model. Please use an
empty directory or a different path instead. If the specified output path
doesn't exist, the directory will be created for you.
Training pipeline: ['tagger', 'parser', 'ner']
Starting with blank model 'sv'
Counting training words (limit=0)
Itn Tag Loss Tag % Dep Loss UAS LAS NER Loss NER P NER R NER F Token % CPU WPS
--- --------- -------- --------- ------ ------ --------- ------ ------ ------ ------- -------
1 23722.240 87.792 74889.173 67.796 56.740 0.000 0.000 0.000 0.000 100.000 15740
....
30 780.531 93.508 13930.092 81.295 75.906 0.000 0.000 0.000 0.000 100.000 15569
✔ Saved model to output directory
models/model-final
Traceback (most recent call last):
File "/home/USER/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/USER/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/USER/miniconda3/lib/python3.7/site-packages/spacy/__main__.py", line 33, in <module>
plac.call(commands[command], sys.argv[1:])
File "/home/USER/miniconda3/lib/python3.7/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/home/USER/miniconda3/lib/python3.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/home/USER/miniconda3/lib/python3.7/site-packages/spacy/cli/train.py", line 497, in train
best_model_path = _collate_best_model(meta, output_path, nlp.pipe_names)
File "/home/USER/miniconda3/lib/python3.7/site-packages/spacy/cli/train.py", line 559, in _collate_best_model
bests[component] = _find_best(output_path, component)
File "/home/USER/miniconda3/lib/python3.7/site-packages/spacy/cli/train.py", line 578, in _find_best
accs = srsly.read_json(epoch_model / "accuracy.json")
File "/home/USER/miniconda3/lib/python3.7/site-packages/srsly/_json_api.py", line 50, in read_json
file_path = force_path(location)
File "/home/USER/miniconda3/lib/python3.7/site-packages/srsly/util.py", line 21, in force_path
raise ValueError("Can't read file: {}".format(location))
ValueError: Can't read file: models/model-best/accuracy.json
The respective folder itself only contains a meta.json
file:
$ ls models/model-best/
meta.json ner parser tagger tokenizer vocab
Fortunately the training of the model itself is not affected by this problem. Still it would be nice if I could run it without crashing.
Is there anything I can do to fix this?
Upvotes: 0
Views: 1680
Reputation: 4593
I noticed that some of the entity is not present in the valid data set but present in the train data set. This is how I solved the above issue.
Upvotes: 0
Reputation: 11474
Heed the warning shown by the script and start with an empty output directory:
⚠ Output directory is not empty
This can lead to unintended side effects when saving the model. Please use an
empty directory or a different path instead. If the specified output path
doesn't exist, the directory will be created for you.
Upvotes: 2