Hamilton
Hamilton

Reputation: 678

Any elegant solution for finding compound noun-adjective pairs from sentence by using Spacy?

I am recently have known with Spacy and quite interested in this Python library. However, in my specification, I intend to extract compound noun-adjective pairs as a key phrase from the input sentence. I think Spacy provides a lot of utilities to work with NLP task, but didn't find a satisfied clue for my desired task. I looked into a very similar post in SO, related post, and solution is not very efficient and doesn't work for custom input sentence.

Here is some of the input sentence:

sentence_1="My problem was with DELL Customer Service"
sentence_2="Obviously one of the most important features of any computer is the human interface."
sentence_3="The battery life seems to be very good and have had no issues with it."

Here is the code that I tried:

import spacy, en_core_web_sm
nlp=en_core_web_sm.load()

def get_compound_nn_adj(doc):
    compounds_nn_pairs = []
    parsed=nlp(doc)
    compounds = [token for token in sent if token.dep_ == 'compound']
    compounds = [nc for nc in compounds if nc.i == 0 or sent[nc.i - 1].dep_ != 'compound']
    if compounds:
        for token in compounds:
            pair_1, pair_2 = (False, False)
            noun = sent[token.i:token.head.i + 1]
            pair_1 = noun
            if noun.root.dep_ == 'nsubj':
                adj_list = [rt for rt in noun.root.head.rights if rt.pos_ == 'ADJ']
                if adj_list:
                    pair_2 = adj_list[0]
            if noun.root.dep_ == 'dobj':
                verb_root = [vb for vb in noun.root.ancestors if vb.pos_ == 'VERB']
                if verb_root:
                    pair_2 = verb_root[0]
            if pair_1 and pair_2:
                compounds_nn_pairs.append(pair_1, pair_2)
    return compounds_nn_pairs

I am speculating that what kind of modification should be applied above helper function because it didn't return my expected compound noun-adjective pairs. Is there anyone who have good experiences with Spacy? How can I improve above sketch solution? Any better idea?

Desired output:

I am expecting to get compound noun-adjective pairs from each of input sentence as follow:

desired_output_1="DELL Customer Service"
desired_output_2="human interface"
desired_output_3="battery life"

Is there any way I could get the expected output? what kind of update will be needed for the above implementation? Any more thoughts? Thanks in advance!

Upvotes: 0

Views: 2814

Answers (3)

Extending the answers above I would like to add that you can also get context with the word inside just checking at first children in the left and then in the right.

doc = nlp('this is your sentence here')
for w in doc:
    if w.pos_ == "NOUN":
        context = [j for j in w.lefts if j.pos_ in ["ADJ", "NOUN"]]
        context.append(w.text)
        context.extend([j for j in w.rights if j.pos_ in ["ADJ", "NOUN"]])

You can also check the whole subtree with attribute of token.subtree, but in my cases it performed worse and showed nearly the whole sentence.

Upvotes: 1

Jack Parsons
Jack Parsons

Reputation: 161

I suspect that this has to be handled with a database of compound nouns. The status of "compound noun" comes from commonality of usage. So, maybe the various n-gram databases (like Google's) could be a source.

Upvotes: 1

ahalt
ahalt

Reputation: 610

It looks like spaCy is only detecting compound relations in sentences 1 and 3, and treating 2's as an amod relation. (Here's some quick code to check its parse: [(i, i.pos_, i.dep_) for i in nlp(sentence_1)]).

To get the compounds out of 1 and 3, try this:

for i in nlp(sentence_1):
    if i.pos_ in ["NOUN", "PROPN"]:
        comps = [j for j in i.children if j.dep_ == "compound"]
        if comps:
            print(comps, i)

For each noun or proper noun in the sentence, it checks its subtree for compound relations.

To cast a wider net that also picks up adjectives, you could look for adjectives and nouns in the word's subtree, not just compounds:

for i in nlp(sentence_2):
    if i.pos_ in ["NOUN", "PROPN"]:
        comps = [j for j in i.children if j.pos_ in ["ADJ", "NOUN", "PROPN"]]
        if comps:
            print(comps, i)

Upvotes: 2

Related Questions