Reputation: 206
Spacy 'train' command uses a command line option --gpu 0, allowing a 'last minute' choice between training with GPU and without it - using CPU only.
However, using the https://spacy.io/usage/training#quickstart to choose between GPU and CPU results in a major difference in (basic) configuration. In my case (dealing with NER), I get two different pipelines:
(with a very different following component setup).
Since my GPU has only 6GB of memory, I run out of GPU memory fairly fast - can't use it. But when I switch to using CPU only, the training behavior between the two pipelines is vastly different:
The ["tok2vec","ner"] pipeline runs pretty much on a single core, training my model (8,000 training, 2000 dev/validation docs) in couple hours. Notably faster than Spacy 2 (even with GPU), though at times using a lot of memory (up to 30G).
The ["transformer","ner"] pipeline explodes into using up to 20 cores (on a 40 logical core machine), so I would expect it to run fast. But it appears to run forever. In an hour I only get the first 'epoch' completed, then (on the next epoch) it crashes (see below). Since my data (DocBin files batching 100 'documents' each) is the same, the crash below (out-of-sequence B/I tag) is hard to explain.
My main question is WHY is the pipeline different when targeting GPU vs CPU? Where are the vectors in case of targeting GPU?
Crash: ...
File "C:\Work\ML\Spacy3\lib\site-packages\spacy\training\loop.py", line 98, in train
for batch, info, is_best_checkpoint in training_step_iterator:
File "C:\Work\ML\Spacy3\lib\site-packages\spacy\training\loop.py", line 194, in train_while_improving
nlp.update(
File "C:\Work\ML\Spacy3\lib\site-packages\spacy\language.py", line 1107, in update
proc.update(examples, sgd=None, losses=losses, **component_cfg[name])
File "spacy\pipeline\transition_parser.pyx", line 350, in spacy.pipeline.transition_parser.Parser.update
File "spacy\pipeline\transition_parser.pyx", line 604, in spacy.pipeline.transition_parser.Parser._init_gold_batch
File "spacy\pipeline\_parser_internals\ner.pyx", line 273, in spacy.pipeline._parser_internals.ner.BiluoPushDown.init_gold
File "spacy\pipeline\_parser_internals\ner.pyx", line 53, in spacy.pipeline._parser_internals.ner.BiluoGold.__init__
File "spacy\pipeline\_parser_internals\ner.pyx", line 69, in spacy.pipeline._parser_internals.ner.create_gold_state
File "spacy\training\example.pyx", line 240, in spacy.training.example.Example.get_aligned_ner
File "spacy\tokens\doc.pyx", line 698, in spacy.tokens.doc.Doc.ents.__get__
ValueError: [E093] token.ent_iob values make invalid sequence: I without B
Upvotes: 1
Views: 3597
Reputation: 15623
Basically if you choose "GPU" in the quickstart spaCy uses the Transformers pipeline, which is architecturally pretty different from the CPU pipeline. The settings in the quickstart are the recommended base settings, while the settings spaCy is able to actually use are much broader (and the -gpu flag in training is one of those).
Transformers use attention to generate contextual embeddings, so there's no real concept of a single embedding for a word. These contextual embeddings are typically better than word embeddings. The spaCy Transformers models don't include word embeddings for this reason. The downside to Transformers is that they require pretty powerful hardware, including a GPU, to run. If you do have a powerful GPU, it usually makes sense to use Transformers.
The models used by the CPU pipeline don't require specialized hardware and are in general much faster to run, while still providing sufficient accuracy for many applications. If you don't have a GPU they're also basically your only option. If you do have a GPU, you can use it to train non-Transformers pipelines, and it may provide speedup, but the benefits are typically not dramatic. So spaCy supports training non-Transformers models on GPU, but if you have a GPU it's usually better to use Transformers.
Upvotes: 5