user2740947
user2740947

Reputation: 181

Spacy: How to create a document from the sentence tokenized text?

I have text which is already sentence tokenized and wonder how to make a spacy document from it?

Upvotes: 1

Views: 597

Answers (1)

user2740947
user2740947

Reputation: 181

After a bit of research I came up with the following simple solution:

nlp = spacy.load('en')
sents = [['sentence', 'one'], ['sentence', 'two']]
doc = nlp.tokenizer.tokens_from_list([t for s in sents for t in s])
for t in doc:
    t.is_sent_start = False
i = 0
for s in sents:
    doc[i].is_sent_start = True
    i += len(s)

Upvotes: 1

Related Questions