Reputation: 397
I have a tree-structure separated by tabs and lines like this:
a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5
And I am looking to turn this into:
{
'name': 'a',
'children': [
{'name': '1'},
{'name': '2'},
{
'name': '3'
'children': [
{'name': 'b'},
{'name': 'c'}
]
},
{'name': '4'},
{'name': '5'}
]
}
for a d3.js collapsible tree data input. I am assuming I have to use recursion in some way but I cannot figure out how.
I have tried turning the input into a list like this:
[('a',0), ('1',1), ('2',1), ('3',1), ('b',2), ('c',2), ('4',1), ('5',1)]
using this code:
def parser():
#run from root `retail-tree`: `python3 src/main.py`
l, all_line_details = list(), list()
with open('assets/retail') as f:
for line in f:
line = line.rstrip('\n ')
splitline = line.split(' ')
tup = (splitline[-1], len(splitline)-1)
l.append(splitline)
all_line_details.append(tup)
print(tup)
return all_line_details
Here, the first element is the string itself and the second is the number of tabs there are in that line. Unsure of the recursion step to accomplish this. Appreciate any help!
Upvotes: 2
Views: 523
Reputation: 158
Working from the list structure you provided from your own parser
function:
def make_tree(lines, tab_count=0):
tree = []
index = 0
while index < len(lines):
if lines[index][1] == tab_count:
node = {"name": lines[index][0]}
children, lines_read = make_tree(lines[index + 1:], tab_count + 1)
if children:
node["children"] = children
index += lines_read
tree.append(node)
else:
break
index += 1
return tree, index
Test cases:
lines = [("a", 0), ("1", 1), ("2", 1), ("3", 1), ("b", 2), ("c", 2), ("4", 1), ("5", 1)]
test_1 = make_tree([("a", 0)])
assert test_1[0] == [{"name": "a"}], test_1
test_2 = make_tree([("a", 0), ("b", 1)])
assert test_2[0] == [{"name": "a", "children": [{"name": "b"}]}], test_2
test_3 = make_tree(lines)
expected_3 = [
{
"name": "a",
"children": [
{"name": "1"},
{"name": "2"},
{"name": "3", "children": [{"name": "b"}, {"name": "c"}]},
{"name": "4"},
{"name": "5"},
],
}
]
assert test_3[0] == expected_3, test_3
Note the output is wrapped in a list in case your source file has more than one root node (i.e. more than one line with no leading tabs), and also for neatness of the recursion.
Upvotes: 0
Reputation: 106891
You can use a function that uses re.findall
with a regex that matches a line as the name of the node, followed by 0 or more lines that start with a tab, grouped as the children, and then recursively builds the same structure for the children after stripping the first tab of each line from the children string:
import re
def parser(s):
output = []
for name, children in re.findall(r'(.*)\n((?:\t.*\n)*)', s):
node = {'name': name}
if children:
node.update({'children': parser(''.join(line[1:] for line in children.splitlines(True)))})
output.append(node)
return output
so that given:
s = '''a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5
'''
parser(s)[0]
returns:
{'name': 'a',
'children': [{'name': '1'},
{'name': '2'},
{'name': '3', 'children': [{'name': 'b'}, {'name': 'c'}]},
{'name': '4'},
{'name': '5'}]}
Upvotes: 2