Reputation: 101
I have a text file that I have created a DOC object from using SpaCy:
doc = nlp.make_doc(raw_text)
I also have a list of customized IOP tags for each word in this DOC object:
['O', 'B-PER', 'I-PER', 'O', 'O', 'O', 'O', 'B-DATE', 'I-DATE', 'I-DATE', 'I-DATE', 'I-DATE', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'I-ORG']
I want to assign each one of these tags to each word in the DOC object and visualize the results using DisplaCy (The Length of the tag list is equal to the length of the doc object):
displacy.render(doc, style="ent", jupyter=True)
However, I do not know how to achieve this. I have used the doc.char_span
method for when I had the start and end index for each tag label:
def Visulaizer(file_name, path):
nlp = spacy.blank("fa")
f = open(path+'/'+file_name+'.txt', 'r')
raw_text = f.read()
doc = nlp.make_doc(raw_text)
spans = tags
ents = []
for span_start, span_end, label in spans:
ent = doc.char_span(span_start, span_end, label=label)
if ent is None:
continue
ents.append(ent)
doc.ents = ents
displacy.render(doc, style="ent", jupyter=True)
visualized_data = Visulaizer('test', path)
How can I change my code so that it matches my current settings?
Upvotes: 3
Views: 768
Reputation: 11484
You can only assign ents directly from IOB tags by providing the whole list at once when initializing the Doc
. Use Doc(ents=)
:
from spacy.tokens import Doc
doc = Doc(nlp.vocab, words=words, ents=iob_tags)
Alternatively, you can convert IOB tags to ent spans with spacy.training.iob_utils.biluo_tags_to_spans
and assign to doc.ents
. However, since the default tokenizer from nlp
might not produce the exact same words as in your original annotation, it's best to start with aligned words
and ents
and then you're sure the tags are aligned to the correct words.
Upvotes: 1