Reputation: 377
I have been working with Spacy.io DependencyMatcher and I find it very powerful. But, I do have a question that I couldn't figure out from the documentation. The matches results are a list of tuples for the same MatchID instead of getting one tuple for each match.
Examples. Here are the matches I am getting
[(7324372616739864093, [1, 5]), (7324372616739864093, [1, 6]), (7324372616739864093, [1, 7]), (7324372616739864093, [1, 9]), (7324372616739864093, [1, 10]), (7324372616739864093, [1, 11]), (7324372616739864093, [1, 13]), (7324372616739864093, [1, 15])]
But, I expect the matches to be
[(7324372616739864093, [1, 5, 6, 7, 9, 10, 11, 13, 15])
Here is the code. Can someone tell me what I am doing wrong?
matcher = DependencyMatcher(nlp.vocab)
pattern = [
{
"RIGHT_ID": "anchor_experience",
"RIGHT_ATTRS": {"LOWER": "experience", "POS": "NOUN"}
},
{
"LEFT_ID": "anchor_experience",
"REL_OP": ">>",
"RIGHT_ID": "skills",
"RIGHT_ATTRS": {"POS": {"IN": ["NOUN", "PROPN","VERB"]}}
},
]
matcher.add("EXPERIENCE", [pattern])
matches = None
matches = matcher(doc)
print(matches)
for match in matches:
match_id, token_ids = match
for i in range(len(token_ids)):
print(pattern[i]["RIGHT_ID"] + ":", doc[token_ids[i]].text)
Upvotes: 1
Views: 57
Reputation: 15613
In dependency matcher output, you get one token per dictionary in the input pattern. Thus you have two tokens per match, and you can get multiple matches per doc.
This is helpful for connecting the match results back to the pattern. For your pattern it's not ambiguous, but for more complex patterns it can be helpful.
Upvotes: 1