Reputation: 524
I have a nested list:
lists = [['a', 'b', 'c', 'd'],
['a', 'b', 'd', 'e'],
['a', 'b', 'd', 'f'],
['a', 'b', 'd', 'f', 'h', 'i']]
I know how to build a simple prefix-tree:
tree = {}
end = "END"
for lst in lists:
d = tree
for x in lst:
d = d.setdefault(x, {})
d[end] = {}
Result:
>>> from pprint import pprint
>>> pprint(tree)
{'a': {'b': {'c': {'d': {'END': {}}},
'd': {'e': {'END': {}},
'f': {'END': {}, 'h': {'i': {'END': {}}}}}}}}
Now I can recursively traverse that tree, and whenever a node has only a single child (a sub-dict with just a single element), join those nodes.
def join(d, pref=[]):
if end in d:
yield [' '.join(pref)] if pref else []
for k, v in d.items():
if len(v) == 1:
for x in join(v, pref + [k]): # add node to prefix
yield x # yield next segment
else:
for x in join(v, []): # reset prefix
yield [' '.join(pref + [k])] + x # yield node + prefix and next
Result:
>>> for x in join(tree):
... print(x)
...
['a b', 'c d']
['a b', 'd', 'e']
['a b', 'd', 'f']
['a b', 'd', 'f', 'h i']
What I need is an algorithm where only common pairs of elements become single node of the tree. Ideally, the minimum length of a node = n1, the maximum length of a node = n2. Desired output:
[['a b', 'c d'],
['a b', 'd e'],
['a b', 'd f'],
['a b', 'd f', 'h i']]
Upvotes: 0
Views: 251
Reputation: 1122372
Just alternate between joining and yielding:
def paired(d, _prefix=None):
if end in d:
yield [_prefix] if _prefix else []
for k, v in d.items():
item = [f'{_prefix} {k}'] if _prefix else []
for rest in paired(v, None if _prefix else k):
yield item + rest
So at each level, if _prefix
is set, produce a pair, otherwise leave that level 'empty' and recurse with the current key as the prefix.
This produces your expected output:
>>> for path in paired(tree):
... print(path)
...
['a b', 'c d']
['a b', 'd e']
['a b', 'd f']
['a b', 'd f', 'h i']
Upvotes: 2