BlueBlue
BlueBlue

Reputation: 101

Visualizing customized IOB tags with SpaCy DisplaCy

I have a text file that I have created a DOC object from using SpaCy:

doc = nlp.make_doc(raw_text)

I also have a list of customized IOP tags for each word in this DOC object:

['O', 'B-PER', 'I-PER', 'O', 'O', 'O', 'O', 'B-DATE', 'I-DATE', 'I-DATE', 'I-DATE', 'I-DATE', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'I-ORG']

I want to assign each one of these tags to each word in the DOC object and visualize the results using DisplaCy (The Length of the tag list is equal to the length of the doc object):

 displacy.render(doc, style="ent", jupyter=True)

However, I do not know how to achieve this. I have used the doc.char_span method for when I had the start and end index for each tag label:

def Visulaizer(file_name, path):
  nlp = spacy.blank("fa")
  f = open(path+'/'+file_name+'.txt', 'r')
  raw_text = f.read()
  doc = nlp.make_doc(raw_text)
  spans = tags
  ents = []
  for span_start, span_end, label in spans:
     ent = doc.char_span(span_start, span_end, label=label)
     if ent is None:
         continue
     ents.append(ent)
  doc.ents = ents
  displacy.render(doc, style="ent", jupyter=True)

visualized_data = Visulaizer('test', path)

How can I change my code so that it matches my current settings?

Upvotes: 3

Views: 768

Answers (1)

aab
aab

Reputation: 11484

You can only assign ents directly from IOB tags by providing the whole list at once when initializing the Doc. Use Doc(ents=):

from spacy.tokens import Doc
doc = Doc(nlp.vocab, words=words, ents=iob_tags)

Alternatively, you can convert IOB tags to ent spans with spacy.training.iob_utils.biluo_tags_to_spans and assign to doc.ents. However, since the default tokenizer from nlp might not produce the exact same words as in your original annotation, it's best to start with aligned words and ents and then you're sure the tags are aligned to the correct words.

Upvotes: 1

Related Questions