ryk
ryk

Reputation: 153

Difference between spacy v3 en_core_web_trf pipeline and en_core_web_lg pipeline

I am doing some performance tests with spacy version 3 for right sizing my instances in production. I am observing the following

Observation:

Model name Time without NER Time with NER Comments
en_core_web_lg 4.89 seconds 21.9 seconds NER adds 350% to the original time
en_core_web_trf 43.64 seconds 52.83 seconds NER adds just 20% to the original time

Why is there no significant difference between the with NER and without NER scenarios in the case of the transformer model? Is NER just an incremental task after POS tagging in the case of en_core_web_trf?

Test environment: GPU instance

Test code:

import spacy

assert(spacy.__version__ == '3.0.3')
spacy.require_gpu()
texts = load_sample_texts()  # loads 10,000 texts from a file
assert(len(texts) == 10000)

def get_execution_time(nlp, texts, N):
    return timeit.timeit(stmt="[nlp(text) for text in texts]", 
                           globals={'nlp': nlp, 'texts': texts}, number=N) / N


#  load models
nlp_lg_pos = spacy.load('en_core_web_lg', disable=['ner', 'parser'])
nlp_lg_all = spacy.load('en_core_web_lg')
nlp_trf_pos = spacy.load('en_core_web_trf', disable=['ner', 'parser'])
nlp_trf_all = spacy.load('en_core_web_trf')

#  get execution time
print(f'nlp_lg_pos = {get_execution_time(nlp_lg_pos, texts, N=1)}')
print(f'nlp_lg_all = {get_execution_time(nlp_lg_all, texts, N=1)}')
print(f'nlp_trf_pos = {get_execution_time(nlp_trf_pos, texts, N=1)}')
print(f'nlp_trf_all = {get_execution_time(nlp_trf_all, texts, N=1)}')

Upvotes: 11

Views: 4198

Answers (1)

Ian Thompson
Ian Thompson

Reputation: 3305

Not an expert, but I think this may be due to the design of the pipelines.

Trained pipeline design

The docs for the sm/md/lg models state:

The ner component is independent with its own internal tok2vec layer. sm/md/lg pipeline design

And the docs for the trf model states:

In the transformer (trf) models, the tagger, parser and ner (if present) all listen to the transformer component.

Shared embedding layers

Reusing the tok2vec layer between components can make your pipeline run a lot faster and result in much smaller models. However, it can make the pipeline less modular and make it more difficult to swap components or retrain parts of the pipeline. shared vs independent

SHARED INDEPENDENT
smaller: models only need to include a single copy of the embeddings larger: models need to include the embeddings for each component
faster: embed the documents once for your whole pipeline slower: rerun the embedding for each component
less composable: all components require the same embedding component in the pipeline modular: components can be moved and swapped freely

Experiments

  • NOTE 1: I'm using the 20 newsgroups text dataset and I don't have a GPU so times may vary, but results point in the same general direction.

  • NOTE 2: I am using:

    • spacy 3.5.4
    • en_core_web_lg 3.5.0
    • en_core_web_trf 3.5.0
    • scikit-learn 1.3.0
    • Python 3.11.4.
"""Replicating code as much as possible."""

import timeit

from sklearn.datasets import fetch_20newsgroups
import spacy  # 3.5.4


# spacy.require_gpu()  # I don't have a GPU available
bunch = fetch_20newsgroups(random_state=0)
texts = bunch.data

def get_execution_time(nlp, texts, N):
    return timeit.timeit(
        stmt="[nlp(text) for text in texts]",
        globals={'nlp': nlp, 'texts': texts},
        number=N
    ) / N

#  load models
nlp_lg_pos = spacy.load('en_core_web_lg', disable=['ner', 'parser'])
nlp_lg_all = spacy.load('en_core_web_lg')
nlp_trf_pos = spacy.load('en_core_web_trf', disable=['ner', 'parser'])
nlp_trf_all = spacy.load('en_core_web_trf')

#  get execution time
print(f'nlp_lg_pos = {get_execution_time(nlp_lg_pos, texts, N=1)}')
print(f'nlp_lg_all = {get_execution_time(nlp_lg_all, texts, N=1)}')
print(f'nlp_trf_pos = {get_execution_time(nlp_trf_pos, texts, N=1)}')
print(f'nlp_trf_all = {get_execution_time(nlp_trf_all, texts, N=1)}')
Model name Time without NER or Parser Time with NER and Parser Comments
en_core_web_lg 8.48 seconds 13.98 seconds NER and Parser add 65% to original time
en_core_web_trf 387.67 seconds 382.84 seconds NER and Parser reduce time by 1% (negligible)

It is difficult to modify an existing component and have it listen to the shared tok2vec or transformer layer without retraining. So instead I will replace the en_core_web_trf ner and parser component listeners with their own copies of the transformer layer. If the documentation is correct, this should cause the "Time with (Independent) NER and (Independent) Parser" results to be much slower than either of the previous en_core_web_trf results.

nlp_trf_all_independent = spacy.load('en_core_web_trf')

# make `ner` and `parser` components independent
nlp_trf_all_independent.replace_listeners("transformer", "ner", ["model.tok2vec"])
nlp_trf_all_independent.replace_listeners("transformer", "parser", ["model.tok2vec"])

print(f'nlp_trf_all_independent = {get_execution_time(nlp_trf_all_independent, texts, N=1)}')
Model name Time without NER or Parser Time with (Independent) NER and (Independent) Parser Comments
en_core_web_trf 387.67 seconds 1125.31 seconds (Independent) NER and (Independent) Parser add 190% to the original time

As you can see, making components independent, i.e. not sharing/listening to tok2vec/transformer layers, results in a slower (but more modular) pipeline. I believe this is the reason that the en_core_web_lg model is noticeably slower when you add the ner component as it is independent by default.

Upvotes: 3

Related Questions