user_1177868
user_1177868

Reputation: 444

Phrase extraction with Spacy

Does spacy have some APIs to do phrase* extraction as one would do when using word2phrase or the Phrases class from gensim? Thank you.

PS. Phrases meant as collocations in Linguistics.

Upvotes: 5

Views: 3576

Answers (2)

polm23
polm23

Reputation: 15593

spaCy's noun chunks feature is a useful form of phrase extraction, though quite different from gensim's Phrases or word2phrase.

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Autonomous cars shift insurance liability toward manufacturers")
for chunk in doc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_,
            chunk.root.head.text)

Output:

Autonomous cars cars nsubj shift
insurance liability liability dobj shift
manufacturers manufacturers pobj toward

You can also use the rule-based matchers to get other kinds of phrase defined by part of speech sequences, dependency relations, or other specifications.

Upvotes: 4

sophros
sophros

Reputation: 16660

I am wondering if you have you seen PyTextRank or spacycaKE extension to SpaCy?

Both can help with phrase extraction which is not possible directly with SpaCy.

Upvotes: 5

Related Questions