user14497686
user14497686

Reputation:

an error to build a custom model using spaCy

Issue

Following the official instruction, I'm trying to add an extra training dataset and train a model on local cpu environment.

But I don't change the content of base_config.cfg and config.cfg files.

How can I fix these errors to build a model and evaluate it?

Error

I'm not sure about the first one is an issue or not, and I have no idea to fill the config.cfg file.

  1. The config.cfg file was an empty even after executing the code on the below procedure so far section.

  2. The error message was shown when executing train command.

ℹ Using CPU
✘ Error parsing config overrides
paths -> train   not a section value that can be overwritten

Code

$ python3 -m spacy train config.cfg --output ./output --paths.train train.spacy --paths.dev train.spacy

Procedure so far

$ python3 -m spacy init fill-config base_config.cfg config.cfg

#former output
configparser.DuplicateSectionError: While reading from '<string>' [line 90]: section 'paths' already exists

#current output
  File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 137, in get
    raise RegistryError(
catalogue.RegistryError: [E893] Could not find function 'spacy.MultiHashEmbed.v2' in function registry 'architectures'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.

Available names: spacy-legacy.MaxoutWindowEncoder.v1, spacy-legacy.MishWindowEncoder.v1, spacy-legacy.TextCatEnsemble.v1, spacy-legacy.Tok2Vec.v1, spacy-legacy.WandbLogger.v1, spacy.CharacterEmbed.v1, spacy.EntityLinker.v1, spacy.HashEmbedCNN.v1, spacy.MaxoutWindowEncoder.v2, spacy.MishWindowEncoder.v2, spacy.MultiHashEmbed.v1, spacy.PretrainCharacters.v1, spacy.PretrainVectors.v1, spacy.Tagger.v1, spacy.TextCatBOW.v1, spacy.TextCatCNN.v1, spacy.TextCatEnsemble.v2, spacy.TextCatLowData.v1, spacy.Tok2Vec.v2, spacy.Tok2VecListener.v1, spacy.TorchBiLSTMEncoder.v1, spacy.TransitionBasedParser.v1, spacy.TransitionBasedParser.v2

The config.cfg file was empty even after executing the above code.

base_config.cfg downloaded from the form on the official instruction

# This is an auto-generated partial config. To use it with 'spacy train'
# you can run spacy init fill-config to auto-fill all default settings:
# python -m spacy init fill-config ./base_config.cfg ./config.cfg
[paths]
train = null
dev = null

[system]
gpu_allocator = null

[nlp]
lang = "en"
pipeline = ["tok2vec","ner"]
batch_size = 1000

[components]

[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v2"

[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.tok2vec.model.encode.width}
attrs = ["ORTH", "SHAPE"]
rows = [5000, 2500]
include_static_vectors = false

[components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = 96
depth = 4
window_size = 1
maxout_pieces = 3

[components.ner]
factory = "ner"

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}

[corpora]

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"

[training.optimizer]
@optimizers = "Adam.v1"

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001

[initialize]
vectors = ${paths.vectors}

Upvotes: 1

Views: 1581

Answers (1)

polm23
polm23

Reputation: 15633

It looks like you double-pasted the config or something? From the errors you'll note that it says you have two [paths] sections. About halfway through your file there's a comment like this:

# This is an auto-generated partial config. To use it with 'spacy train'

Try deleting everything from there and down and then doing it again.

Upvotes: 1

Related Questions