Transforms of NLP dependency trees into binary trees?

Question

Spacy (and Core NLP, and other parsers) output dependency trees that can contain varying numbers of children. In spacy for example, each node has a .lefts and .rights relations (multiple left branches and multiple right branches):

Pattern pattern matching algorithms of are considerably simpler (and more efficient) when they worked over predicates trees who's node have a fixed set of arities.

Is there any standard transformation from these multi-trees into from binary trees?

For example, in this example, we have "publish" with two .lefts=[just, journal] and, one .right=[piece]. Can sentences such (generally) be tranformed into a strict binary tree notation (where each node has 0 or 1 left, and 0 or 1 right branch) without much loss of information, or are multi-trees essential to correctly carrying information?

Oliver Mason · Accepted Answer

There are different types of trees in language analysis, immediate constituents and dependency trees (though you wouldn't normally talk of trees in dependency grammar). The former are usually binary (though there is no real reason why they have to be), as each category gets split into two subcategories, eg

S -> NP VP
NP -> det N1
N1 -> adj N1 | noun

Dependencies are not normally binary in nature, so there is no simple way to transform them into binary structures. The only fixed convention is that each word will be dependent on exactly one other word, but it might itself have multiple words depending on it.

So, the answer is basically "no".

Transforms of NLP dependency trees into binary trees?

Answers (1)

Related Questions