Jon
Jon

Reputation: 91

How to return given word and dependency using spacy

I am experimenting with spacy for information extraction and would like to return given tokens, such as object of preposition (pobj) and any compounds.

For the example below I am trying to write code that will return 'radar swivel'

Example word dependencies

So far I have tried:

#component/assy
import spacy

# load english language model
nlp = spacy.load('en_core_web_sm', disable=['ner','textcat'])

def component(text):
    doc = nlp(text)

    for token in doc:
        # extract object
        if (token.dep_=='pobj'):
            return(token.text)
        elif (token.dep_=='compound'):
            return(token.text)
        
df['Component'] = df['Text'].apply(lambda x: component(x))
df.head()

This returns the word 'swivel' but not the proceeded compound 'radar', is there a way I can rewrite the code to detect the pobj and return this with any associated compounds? Thanks!

Upvotes: 0

Views: 508

Answers (1)

Hannibal
Hannibal

Reputation: 316

The return statements are breaking the loop that's why when you arrive at the token which is pobj you move to the next sentence without checking the compounds on the other.
To fix that you use the following function. Once it finds a pobj it looks at its children and checks which ones are compounds:

def component(text):
    doc = nlp(text)
    for token in doc:
        if (token.dep_=='pobj'):
            compounds = [child.text for child in token.children if child.dep_ == "compound"]
            yield " ".join(compounds) + " " + token.text

df['Component'] = df['Text'].apply(lambda x: list(component(x)))
df.head()

Upvotes: 0

Related Questions