Reputation: 13
I am trying to incorporate spacy's dependency parser into a legacy code in java through web API.
All other components tokenizer, tagger, merged_words, NER are done from the legacy NLP code. I am only interested to apply the dependency parser along with the dependency rule matcher of spacy 3.
I have tried the following approach
from spacy.tokens import Doc
sent=["The heating_temperature was found to be 500 C"]
words=["The","heating_temperature", "was", "found", "to", "be", "500", "C"]
spaces=[True,True,True,True,True,True,True,False]
tags=["DT","NN","VBD","VBN","TO","VB","CD","NN"]
ents=["O","I-PARAMETER","O","O","O","O","I-VALUE","O"]
doc = Doc(nlp.vocab, words=words,spaces=spaces, tags=tags, ents=ents)
#can use nlp.blank too
nlp2 = spacy.load("en_core_web_sm", exclude=['attribute_ruler', 'lemmatizer', 'ner', "parser","tagger"])
pipeWithParser = nlp2.add_pipe("parser", source=spacy.load("en_core_web_sm"))
processed_dep = pipeWithParser(doc) #refer similar example in https://spacy.io/api/tagger#call
However, I am getting the following dependency tree
where every word is an nmod relation to the first word.
What am I missing? I could use the tagger of spacy too if req. I tried including tagger using above similar method but all tags were labeled 'NN'
Upvotes: 1
Views: 253
Reputation: 11474
The parser
component in en_core_web_sm
depends on the tok2vec
component, so you need to run tok2vec
on the doc
before running parser
for the parser to have the right input.
doc = nlp2.get_pipe("tok2vec")(doc)
doc = nlp2.get_pipe("parser")(doc)
Upvotes: 1