Harshita Pandey
Harshita Pandey

Reputation: 21

Spacy taking too long to run as compared to before

The spacy module is taking too long to vectorize a sentence.

for question in Question_Set:
   sentence = nlp(question)

The dataset contains nearly 300k questions. Initially, this code was taking 15 minutes to run. However, now when I am running the same code, it is showing near about 4 hours. The spacy module is taking too long to vectorize a sentence.

Upvotes: 2

Views: 1064

Answers (1)

Branden Ciranni
Branden Ciranni

Reputation: 492

You can use nlp.pipe.

for doc in nlp.pipe(texts, n_process=2, batch_size=2000):
    # do something here

nlp.pipe allows for both multiprocessing and batching. You can specify the number of cores your machine has and a batch size that is reasonable.

An additional speed improvement could be achieved by disabling components of the nlp() pipeline that you do not need. For example,

for doc in nlp.pipe(texts, n_process=2, batch_size=2000, disable=['ner', 'lemmatizer']):
    # do something here

Find more on how to speed up the processing pipeline here: https://spacy.io/usage/processing-pipelines.

Upvotes: 3

Related Questions