Reputation: 21
The spacy module is taking too long to vectorize a sentence.
for question in Question_Set:
sentence = nlp(question)
The dataset contains nearly 300k questions. Initially, this code was taking 15 minutes to run. However, now when I am running the same code, it is showing near about 4 hours. The spacy module is taking too long to vectorize a sentence.
Upvotes: 2
Views: 1064
Reputation: 492
You can use nlp.pipe
.
for doc in nlp.pipe(texts, n_process=2, batch_size=2000):
# do something here
nlp.pipe
allows for both multiprocessing and batching. You can specify the number of cores your machine has and a batch size that is reasonable.
An additional speed improvement could be achieved by disabling components of the nlp()
pipeline that you do not need. For example,
for doc in nlp.pipe(texts, n_process=2, batch_size=2000, disable=['ner', 'lemmatizer']):
# do something here
Find more on how to speed up the processing pipeline here: https://spacy.io/usage/processing-pipelines.
Upvotes: 3