Reputation: 1272
I'm working on a nlp problem, given a sentence with two entities I need to generate boolean indicating for each word if it stands on the dependency path between those entities.
For example:
'A misty < e1 >ridge< /e1 > uprises from the < e2 >surge< /e2 >'
I want to iterate on each words and tell if it is on the dependency path between e1 and e2
Two important notes:
-If you try to help me (first thanks), don't bother considering the xml markup with < e1 > and < e2 >, I really am interested in how to find if a word is on the dependency path between any two given words with spaCy, I take care of which words by myself
-As I'm not a nlp expert, I'm kind of confused with the meaning of "on the dependency path" and I'm sorry if it is not clear enough (these are the words used by my tutor)
Thanks in advance
Upvotes: 2
Views: 2121
Reputation: 1272
So my solution was found using that post
There is an answer dedicated to spaCy
My implementation for finding the dependency path between two words in a given sentence:
import networkx as nx
import spacy
enter code here
doc = nlp("Ships carrying equipment for US troops are already waiting off the Turkish coast")
def shortest_dependency_path(doc, e1=None, e2=None):
edges = []
for token in doc:
for child in token.children:
edges.append(('{0}'.format(token),
'{0}'.format(child)))
graph = nx.Graph(edges)
try:
shortest_path = nx.shortest_path(graph, source=e1, target=e2)
except nx.NetworkXNoPath:
shortest_path = []
return shortest_path
print(shortest_dependency_path(doc,'Ships','troops'))
Output:
['Ships', 'carrying', 'for', 'troops']
What it actually does is to first build a non-oriented graph for the sentence where words are the nodes and dependencies between words are the edges and then find the shortest path between two nodes
For my needs, I just then check for each word if it's on the dependency path (shortest path) generated
Upvotes: 3
Reputation: 13126
Dependency path is a way of describing how clauses are build within a sentence. SpaCy has a really good example in their docs here, with the sentence Apple is looking at buying U.K. startup for $1 billion.
Pardon my lack of good visualization here, but to work through your example:
A misty ridge uprises from the surge.
In spaCy, we follow their example to get the dependencies:
import spacy
nlp = spacy.load('en_core_web_lg')
doc = nlp("A misty ridge uprises from the surge.")
for chunk in doc.noun_chunks:
print(chunk.text, chunk.root.text, chunk.root.dep_, chunk.root.head.text)
This will get the "clauses" which make up your sentence. Your output will look like so:
Text | root.text| root.dep_ | root.head.text
A misty ridge uprises uprises ROOT uprises
the surge surge pobj from
chunk.text
is the text that makes up your dependency clause (note, there may be overlap depending on sentence structure). root.text
gives the root (or head) of the dependency tree. The head
of the tree is a spaCy token
object, and has children that you can iterate through to check if another token is on the dependency tree.
def find_dependencies(doc, word_to_check=None, dep_choice=None):
"""
word_to_check is the word you'd like to see on the dependency tree
example, word_to_check="misty"
dep_choice is the text of the item you'd like the dependency check
to be against. Example, dep_choice='ridge'
"""
tokens, texts = [], []
for tok in doc:
tokens.append(tok)
texts.append(tok.text)
# grabs the index/indices of the token that you are interested in
indices = [i for i,text in enumerate(texts) if text==dep_choice]
words_in_path = []
for i in indices:
reference = tokens[i]
child_elements = [t.text for t in reference.get_children()]
if word_to_check in child_elements:
words_in_path.append((word_to_check, reference))
return words_in_path
The code isn't the prettiest, but that's a way you could get a list of tuples containing the word you want to check versus the associated parent token. Hopefully that's helpful
EDIT:
In the interest of tailoring a bit more to your use case (and massively simplifying what my original answer looks like):
# This will give you 'word':<spaCy doc object> key value lookup capability
tokens_lookup = {tok.text:tok for tok in doc}
if "misty" in tokens_lookup.get("ridge").children:
# Extra logic here
Upvotes: 2