Reputation: 77
The aim is to extract the sub-tree (phrases) from the sentence if the 'nsubj' exists in the given sentence.
Here is the code which I am using:
import spacy
nlp = spacy.load('en')
piano_doc = nlp('The alarm clock is, to many high school students, a wailing monstrosity whose purpose is to torture all who are sleep-deprived')
for token in piano_doc:
if token.dep_ == 'nsubj':
print (token.text, token.tag_, token.head.text, token.dep_)
subtree = token.subtree
print([(t.text) for t in subtree])
print('*' * 50)
The output we get is: clock NN is nsubj
['The', 'alarm', 'clock']
purpose NN is nsubj
['whose', 'purpose']
who WP are nsubj
['who']
But the output i am expecting in the case of nsubj is the whole subtree i.e.
purpose NN is nsubj
['whose', 'purpose','is','to','torture']
who WP are nsubj
['who' ,'are' ,'sleep-deprived']
Upvotes: 1
Views: 2342
Reputation: 15633
As krisograbek mentioned, your understanding of a subtree is not what a subtree is in spaCy, or in dependency parsing in general.
In dependency parsing, if you have a subject and a verb, the verb is the head. This means the subtree of the subject does not include the verb.
I am not sure exactly what you want but maybe you should try token.head.subtree
for the subject.
Upvotes: 1