Reputation: 191

Spacy to extract specific noun phrase

Can I use spacy in python to find NP with specific neighbors? I want Noun phrases from my text that has verb before and after it.

Upvotes: 8

Answers (3)

Syauqi Haris

Reputation: 416

If you want to re-tokenize using merge phrases, I prefer this (rather than noun chunks) :

import spacy
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(nlp.create_pipe('merge_noun_chunks'))
doc = nlp(u"Autonomous cars shift insurance liability toward manufacturers")
for token in doc:
    print(token.text)

and the output will be :

Autonomous cars
shift
insurance liability
toward
manufacturers

I choose this way because each token has property for further process :)

Upvotes: 1

aerin

Reputation: 22724

From https://spacy.io/usage/linguistic-features#dependency-parse

You can use Noun chunks. Noun chunks are "base noun phrases" – flat phrases that have a noun as their head. You can think of noun chunks as a noun plus the words describing the noun – for example, "the lavish green grass" or "the world’s largest tech fund". To get the noun chunks in a document, simply iterate over Doc.noun_chunks.

In:
        import spacy
        nlp = spacy.load('en_core_web_sm')
        doc = nlp(u"Autonomous cars shift insurance liability toward manufacturers")
        for chunk in doc.noun_chunks:
            print(chunk.text)

Out:

        Autonomous cars
        insurance liability
        manufacturers

Upvotes: 2

DhruvPathak

Reputation: 43265

You can merge the noun phrases ( so that they do not get tokenized seperately).

Analyse the dependency parse tree, and see the POS of neighbouring tokens.

>>> import spacy
>>> nlp = spacy.load('en')
>>> sent = u'run python program run, to make this work'
>>> parsed = nlp(sent)
>>> list(parsed.noun_chunks)
[python program]
>>> for noun_phrase in list(parsed.noun_chunks):
...     noun_phrase.merge(noun_phrase.root.tag_, noun_phrase.root.lemma_, noun_phrase.root.ent_type_)
... 
python program
>>> [(token.text,token.pos_) for token in parsed]
[(u'run', u'VERB'), (u'python program', u'NOUN'), (u'run', u'VERB'), (u',', u'PUNCT'), (u'to', u'PART'), (u'make', u'VERB'), (u'this', u'DET'), (u'work', u'NOUN')]

By analysing the POS of adjacent tokens, you can get your desired noun phrases.
A better approach would be to analyse the dependency parse tree, and see the lefts and rights of the noun phrase, so that even if there is a punctuation or other POS tag between the noun phrase and verb, you can increase your search coverage

Upvotes: 13

Spacy to extract specific noun phrase

Answers (3)

Related Questions