Reputation: 153

Difference between spacy v3 en_core_web_trf pipeline and en_core_web_lg pipeline

I am doing some performance tests with spacy version 3 for right sizing my instances in production. I am observing the following

Observation:

Model name	Time without NER	Time with NER	Comments
en_core_web_lg	4.89 seconds	21.9 seconds	NER adds 350% to the original time
en_core_web_trf	43.64 seconds	52.83 seconds	NER adds just 20% to the original time

Why is there no significant difference between the with NER and without NER scenarios in the case of the transformer model? Is NER just an incremental task after POS tagging in the case of en_core_web_trf?

Test environment: GPU instance

Test code:

import spacy

assert(spacy.__version__ == '3.0.3')
spacy.require_gpu()
texts = load_sample_texts()  # loads 10,000 texts from a file
assert(len(texts) == 10000)

def get_execution_time(nlp, texts, N):
    return timeit.timeit(stmt="[nlp(text) for text in texts]", 
                           globals={'nlp': nlp, 'texts': texts}, number=N) / N


#  load models
nlp_lg_pos = spacy.load('en_core_web_lg', disable=['ner', 'parser'])
nlp_lg_all = spacy.load('en_core_web_lg')
nlp_trf_pos = spacy.load('en_core_web_trf', disable=['ner', 'parser'])
nlp_trf_all = spacy.load('en_core_web_trf')

#  get execution time
print(f'nlp_lg_pos = {get_execution_time(nlp_lg_pos, texts, N=1)}')
print(f'nlp_lg_all = {get_execution_time(nlp_lg_all, texts, N=1)}')
print(f'nlp_trf_pos = {get_execution_time(nlp_trf_pos, texts, N=1)}')
print(f'nlp_trf_all = {get_execution_time(nlp_trf_all, texts, N=1)}')

Upvotes: 11

Answers (1)

Ian Thompson

Reputation: 3305

Not an expert, but I think this may be due to the design of the pipelines.

Trained pipeline design

The docs for the sm/md/lg models state:

The ner component is independent with its own internal tok2vec layer.

And the docs for the trf model states:

In the transformer (trf) models, the tagger, parser and ner (if present) all listen to the transformer component.

Shared embedding layers

Reusing the tok2vec layer between components can make your pipeline run a lot faster and result in much smaller models. However, it can make the pipeline less modular and make it more difficult to swap components or retrain parts of the pipeline.

SHARED	INDEPENDENT
✅ smaller: models only need to include a single copy of the embeddings	❌ larger: models need to include the embeddings for each component
✅ faster: embed the documents once for your whole pipeline	❌ slower: rerun the embedding for each component
❌ less composable: all components require the same embedding component in the pipeline	✅ modular: components can be moved and swapped freely

Experiments

NOTE 1: I'm using the 20 newsgroups text dataset and I don't have a GPU so times may vary, but results point in the same general direction.
NOTE 2: I am using:
- spacy 3.5.4
- en_core_web_lg 3.5.0
- en_core_web_trf 3.5.0
- scikit-learn 1.3.0
- Python 3.11.4.

"""Replicating code as much as possible."""

import timeit

from sklearn.datasets import fetch_20newsgroups
import spacy  # 3.5.4


# spacy.require_gpu()  # I don't have a GPU available
bunch = fetch_20newsgroups(random_state=0)
texts = bunch.data

def get_execution_time(nlp, texts, N):
    return timeit.timeit(
        stmt="[nlp(text) for text in texts]",
        globals={'nlp': nlp, 'texts': texts},
        number=N
    ) / N

#  load models
nlp_lg_pos = spacy.load('en_core_web_lg', disable=['ner', 'parser'])
nlp_lg_all = spacy.load('en_core_web_lg')
nlp_trf_pos = spacy.load('en_core_web_trf', disable=['ner', 'parser'])
nlp_trf_all = spacy.load('en_core_web_trf')

#  get execution time
print(f'nlp_lg_pos = {get_execution_time(nlp_lg_pos, texts, N=1)}')
print(f'nlp_lg_all = {get_execution_time(nlp_lg_all, texts, N=1)}')
print(f'nlp_trf_pos = {get_execution_time(nlp_trf_pos, texts, N=1)}')
print(f'nlp_trf_all = {get_execution_time(nlp_trf_all, texts, N=1)}')

Model name	Time without NER or Parser	Time with NER and Parser	Comments
`en_core_web_lg`	8.48 seconds	13.98 seconds	NER and Parser add 65% to original time
`en_core_web_trf`	387.67 seconds	382.84 seconds	NER and Parser reduce time by 1% (negligible)

It is difficult to modify an existing component and have it listen to the shared tok2vec or transformer layer without retraining. So instead I will replace the en_core_web_trf ner and parser component listeners with their own copies of the transformer layer. If the documentation is correct, this should cause the "Time with (Independent) NER and (Independent) Parser" results to be much slower than either of the previous en_core_web_trf results.

nlp_trf_all_independent = spacy.load('en_core_web_trf')

# make `ner` and `parser` components independent
nlp_trf_all_independent.replace_listeners("transformer", "ner", ["model.tok2vec"])
nlp_trf_all_independent.replace_listeners("transformer", "parser", ["model.tok2vec"])

print(f'nlp_trf_all_independent = {get_execution_time(nlp_trf_all_independent, texts, N=1)}')

Model name	Time without NER or Parser	Time with (Independent) NER and (Independent) Parser	Comments
`en_core_web_trf`	387.67 seconds	1125.31 seconds	(Independent) NER and (Independent) Parser add 190% to the original time

As you can see, making components independent, i.e. not sharing/listening to tok2vec/transformer layers, results in a slower (but more modular) pipeline. I believe this is the reason that the en_core_web_lg model is noticeably slower when you add the ner component as it is independent by default.

Upvotes: 3

Difference between spacy v3 en_core_web_trf pipeline and en_core_web_lg pipeline

Answers (1)

Trained pipeline design

Shared embedding layers

Experiments

Related Questions