Reputation: 444
Does spacy
have some APIs to do phrase* extraction as one would do when using word2phrase
or the Phrases
class from gensim
? Thank you.
PS. Phrases meant as collocations in Linguistics.
Upvotes: 5
Views: 3576
Reputation: 15593
spaCy's noun chunks feature is a useful form of phrase extraction, though quite different from gensim's Phrases or word2phrase.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Autonomous cars shift insurance liability toward manufacturers")
for chunk in doc.noun_chunks:
print(chunk.text, chunk.root.text, chunk.root.dep_,
chunk.root.head.text)
Output:
Autonomous cars cars nsubj shift
insurance liability liability dobj shift
manufacturers manufacturers pobj toward
You can also use the rule-based matchers to get other kinds of phrase defined by part of speech sequences, dependency relations, or other specifications.
Upvotes: 4
Reputation: 16660
I am wondering if you have you seen PyTextRank or spacycaKE extension to SpaCy?
Both can help with phrase extraction which is not possible directly with SpaCy.
Upvotes: 5