Reputation: 3433
Following spacy documentation I find that:
https://spacy.io/usage/visualizers#span
import spacy
from spacy import displacy
from spacy.tokens import Span
text = "Welcome to the Bank of China."
nlp = spacy.blank("en")
doc = nlp(text)
doc.spans["sc"] = [
Span(doc, 3, 6, "ORG"),
Span(doc, 5, 6, "GPE"),
]
#displacy.serve(doc, style="span")
I can not understand why the lists of the spans are added with a key "sc". Every span has a label, i.e. ORG, GPE, etc, why to you need yet another qualifier?
Actually after adding the spans to the doc I can not understand why the spans are not Span classes anymore:
for span in doc.spans:
print(type(span))
and that gives "str". and under
for span in doc.spans['sc']:
print(type(span))
I found the spans. If every span has a label and is included in a list with a name "sc" (or whatever) what for is this double labeling of spans used for?
Upvotes: 0
Views: 1114
Reputation: 15593
Doc.spans
is like a dictionary, where each key is a string and each value is a SpanGroup
, which is basically a list of spans.
The reason Doc.spans
is a dictionary, instead of just a single list of spans, is so that you can have different components add lists of spans for different reasons, or have a single component add different groups of spans.
For example, if you a coreference component, it could use one SpanGroup for each "cluster", where a cluster is lists of spans that refer to the same thing. For the sentence "John Smith called from New York, he said it's raining there", ["John Smith", "he"]
would be one cluster, and ["New York", "there"]
would be another.
If you had a spancat component and also a coref component, they would both need to set Spans on the Doc, but you wouldn't want those spans to get mixed up; Doc.spans
allows you to keeps things clean and separate.
Upvotes: 1