Reputation: 7893
In SpaCy you can set extensions for documents like this:
Doc.set_extension('chapter_id', default='')
doc = nlp('This is my text')
doc._.chapter_id = 'This is my ID'
However, I'm having thousands of text files that should be handled by NLP. And SpaCy suggests to use pipe
for this:
docs = nlp.pipe(array_of_texts)
How to apply my extension values during pipe
?
Upvotes: 0
Views: 382
Reputation: 7105
You probably want to enable the as_tuples
keyword argument on nlp.pipe
, which lets you pass in a list of (text, context)
tuples and will yield out (doc, context)
tuples. So you could do something like this:
data = [('Some text', 1), ('Some other text', 2)]
def process_text(data):
for doc, chapter_id in nlp.pipe(data, as_tuples=True):
doc._.chapter_id = chapter_id
yield doc
Upvotes: 1