Reputation: 153
I am doing some performance tests with spacy version 3 for right sizing my instances in production. I am observing the following
Observation:
Model name | Time without NER | Time with NER | Comments |
---|---|---|---|
en_core_web_lg | 4.89 seconds | 21.9 seconds | NER adds 350% to the original time |
en_core_web_trf | 43.64 seconds | 52.83 seconds | NER adds just 20% to the original time |
Why is there no significant difference between the with NER and without NER scenarios in the case of the transformer model? Is NER just an incremental task after POS tagging in the case of en_core_web_trf?
Test environment: GPU instance
Test code:
import spacy
assert(spacy.__version__ == '3.0.3')
spacy.require_gpu()
texts = load_sample_texts() # loads 10,000 texts from a file
assert(len(texts) == 10000)
def get_execution_time(nlp, texts, N):
return timeit.timeit(stmt="[nlp(text) for text in texts]",
globals={'nlp': nlp, 'texts': texts}, number=N) / N
# load models
nlp_lg_pos = spacy.load('en_core_web_lg', disable=['ner', 'parser'])
nlp_lg_all = spacy.load('en_core_web_lg')
nlp_trf_pos = spacy.load('en_core_web_trf', disable=['ner', 'parser'])
nlp_trf_all = spacy.load('en_core_web_trf')
# get execution time
print(f'nlp_lg_pos = {get_execution_time(nlp_lg_pos, texts, N=1)}')
print(f'nlp_lg_all = {get_execution_time(nlp_lg_all, texts, N=1)}')
print(f'nlp_trf_pos = {get_execution_time(nlp_trf_pos, texts, N=1)}')
print(f'nlp_trf_all = {get_execution_time(nlp_trf_all, texts, N=1)}')
Upvotes: 11
Views: 4198
Reputation: 3305
Not an expert, but I think this may be due to the design of the pipelines.
The docs for the sm/md/lg
models state:
The
ner
component is independent with its own internal tok2vec layer.
And the docs for the trf
model states:
In the transformer (
trf
) models, thetagger
,parser
andner
(if present) all listen to thetransformer
component.
Reusing the tok2vec layer between components can make your pipeline run a lot faster and result in much smaller models. However, it can make the pipeline less modular and make it more difficult to swap components or retrain parts of the pipeline.
SHARED | INDEPENDENT |
---|---|
✅ smaller: models only need to include a single copy of the embeddings | ❌ larger: models need to include the embeddings for each component |
✅ faster: embed the documents once for your whole pipeline | ❌ slower: rerun the embedding for each component |
❌ less composable: all components require the same embedding component in the pipeline | ✅ modular: components can be moved and swapped freely |
NOTE 1: I'm using the 20 newsgroups text dataset and I don't have a GPU so times may vary, but results point in the same general direction.
NOTE 2: I am using:
spacy
3.5.4en_core_web_lg
3.5.0en_core_web_trf
3.5.0scikit-learn
1.3.0Python
3.11.4."""Replicating code as much as possible."""
import timeit
from sklearn.datasets import fetch_20newsgroups
import spacy # 3.5.4
# spacy.require_gpu() # I don't have a GPU available
bunch = fetch_20newsgroups(random_state=0)
texts = bunch.data
def get_execution_time(nlp, texts, N):
return timeit.timeit(
stmt="[nlp(text) for text in texts]",
globals={'nlp': nlp, 'texts': texts},
number=N
) / N
# load models
nlp_lg_pos = spacy.load('en_core_web_lg', disable=['ner', 'parser'])
nlp_lg_all = spacy.load('en_core_web_lg')
nlp_trf_pos = spacy.load('en_core_web_trf', disable=['ner', 'parser'])
nlp_trf_all = spacy.load('en_core_web_trf')
# get execution time
print(f'nlp_lg_pos = {get_execution_time(nlp_lg_pos, texts, N=1)}')
print(f'nlp_lg_all = {get_execution_time(nlp_lg_all, texts, N=1)}')
print(f'nlp_trf_pos = {get_execution_time(nlp_trf_pos, texts, N=1)}')
print(f'nlp_trf_all = {get_execution_time(nlp_trf_all, texts, N=1)}')
Model name | Time without NER or Parser | Time with NER and Parser | Comments |
---|---|---|---|
en_core_web_lg |
8.48 seconds | 13.98 seconds | NER and Parser add 65% to original time |
en_core_web_trf |
387.67 seconds | 382.84 seconds | NER and Parser reduce time by 1% (negligible) |
It is difficult to modify an existing component and have it listen to the shared tok2vec
or transformer
layer without retraining. So instead I will replace the en_core_web_trf
ner
and parser
component listeners with their own copies of the transformer layer. If the documentation is correct, this should cause the "Time with (Independent) NER and (Independent) Parser" results to be much slower than either of the previous en_core_web_trf
results.
nlp_trf_all_independent = spacy.load('en_core_web_trf')
# make `ner` and `parser` components independent
nlp_trf_all_independent.replace_listeners("transformer", "ner", ["model.tok2vec"])
nlp_trf_all_independent.replace_listeners("transformer", "parser", ["model.tok2vec"])
print(f'nlp_trf_all_independent = {get_execution_time(nlp_trf_all_independent, texts, N=1)}')
Model name | Time without NER or Parser | Time with (Independent) NER and (Independent) Parser | Comments |
---|---|---|---|
en_core_web_trf |
387.67 seconds | 1125.31 seconds | (Independent) NER and (Independent) Parser add 190% to the original time |
As you can see, making components independent, i.e. not sharing/listening to tok2vec
/transformer
layers, results in a slower (but more modular) pipeline. I believe this is the reason that the en_core_web_lg
model is noticeably slower when you add the ner
component as it is independent by default.
Upvotes: 3