Reputation: 91
I am experimenting with spacy for information extraction and would like to return given tokens, such as object of preposition (pobj) and any compounds.
For the example below I am trying to write code that will return 'radar swivel'
So far I have tried:
#component/assy
import spacy
# load english language model
nlp = spacy.load('en_core_web_sm', disable=['ner','textcat'])
def component(text):
doc = nlp(text)
for token in doc:
# extract object
if (token.dep_=='pobj'):
return(token.text)
elif (token.dep_=='compound'):
return(token.text)
df['Component'] = df['Text'].apply(lambda x: component(x))
df.head()
This returns the word 'swivel' but not the proceeded compound 'radar', is there a way I can rewrite the code to detect the pobj and return this with any associated compounds? Thanks!
Upvotes: 0
Views: 508
Reputation: 316
The return statements are breaking the loop that's why when you arrive at the token which is pobj you move to the next sentence without checking the compounds on the other.
To fix that you use the following function. Once it finds a pobj it looks at its children and checks which ones are compounds:
def component(text):
doc = nlp(text)
for token in doc:
if (token.dep_=='pobj'):
compounds = [child.text for child in token.children if child.dep_ == "compound"]
yield " ".join(compounds) + " " + token.text
df['Component'] = df['Text'].apply(lambda x: list(component(x)))
df.head()
Upvotes: 0