Reputation: 34784
I have an accounting tree that's stored with indents/spaces in the source:
Income
Revenue
IAP
Ads
Other-Income
Expenses
Developers
In-house
Contractors
Advertising
Other Expenses
There are a fixed number of levels, so I'd like to flatten the hierarchy by using 3 fields (actual data has 6 levels, simplified for example):
L1 L2 L3
Income
Income Revenue
Income Revenue IAP
Income Revenue Ads
Income Other-Income
Expenses Developers In-house
... etc
I can do this by checking the number of spaces prior to the account name:
for rownum in range(6,ws.max_row+1):
accountName = str(ws.cell(row=rownum,column=1).value)
indent = len(accountName) - len(accountName.lstrip(' '))
if indent == 0:
l1 = accountName
l2 = ''
l3 = ''
elif indent == 3:
l2 = accountName
l3 = ''
else:
l3 = accountName
w.writerow([l1,l2,l3])
Is there a more flexible way to achieve this based on the indentation of the current row compared to the previous row rather than assuming it's always 3 spaces per level? L1
will always have no indent, and we can trust that lower levels will be indented further than their parent, but maybe not always 3 spaces per level.
Update, ended up with this as the meat of the logic, since I ultimately wanted the account list with the content, it seemed easiest to just use the indent to decide whether to reset, append, or pop the list:
if indent == 0:
accountList = []
accountList.append((indent,accountName))
elif indent > prev_indent:
accountList.append((indent,accountName))
elif indent <= prev_indent:
max_indent = int(max(accountList,key=itemgetter(0))[0])
while max_indent >= indent:
accountList.pop()
max_indent = int(max(accountList,key=itemgetter(0))[0])
accountList.append((indent,accountName))
So at each row of output the accountList is complete.
Upvotes: 8
Views: 3809
Reputation: 531
Laurent's answer was very helpful but it was doing confusing things when I tried to create an array instead of printing directly to stdout. Here is how I was able to make it work:
from copy import deepcopy
def parse(f):
d = []
stack = []
for content in f.splitlines():
row = content.split(" ")
stack[:] = stack[: len(row) - 1] + [row[-1]]
s = deepcopy(stack)
d.append(s)
return d
Upvotes: 0
Reputation: 16730
You can mimick the way Python actually parses the indentation. First, create a stack that will contain the indentation levels. At each line:
indentation = []
indentation.append(0)
depth = 0
f = open("test.txt", 'r')
for line in f:
line = line[:-1]
content = line.strip()
indent = len(line) - len(content)
if indent > indentation[-1]:
depth += 1
indentation.append(indent)
elif indent < indentation[-1]:
while indent < indentation[-1]:
depth -= 1
indentation.pop()
if indent != indentation[-1]:
raise RuntimeError("Bad formatting")
print(f"{content} (depth: {depth})")
With a "test.txt" file whose content is as you provided:
Income
Revenue
IAP
Ads
Other-Income
Expenses
Developers
In-house
Contractors
Advertising
Other Expenses
Here is the output:
Income (depth: 0)
Revenue (depth: 1)
IAP (depth: 2)
Ads (depth: 2)
Other-Income (depth: 1)
Expenses (depth: 0)
Developers (depth: 1)
In-house (depth: 2)
Contractors (depth: 2)
Advertising (depth: 1)
Other Expense (depth: 1)
So, what can you do with this? Suppose you want to build nested lists. First, create a data stack.
And regardless, for each line, append the content to the list at the top of the data stack.
Here is the corresponding implementation:
for line in f:
line = line[:-1]
content = line.strip()
indent = len(line) - len(content)
if indent > indentation[-1]:
depth += 1
indentation.append(indent)
data.append([])
elif indent < indentation[-1]:
while indent < indentation[-1]:
depth -= 1
indentation.pop()
top = data.pop()
data[-1].append(top)
if indent != indentation[-1]:
raise RuntimeError("Bad formatting")
data[-1].append(content)
while len(data) > 1:
top = data.pop()
data[-1].append(top)
Your nested list is at the top of your data
stack.
The output for the same file is:
['Income',
['Revenue',
['IAP',
'Ads'
],
'Other-Income'
],
'Expenses',
['Developers',
['In-house',
'Contractors'
],
'Advertising',
'Other Expense'
]
]
This is rather easy to manipulate, although quite deeply nested. You can access the data by chaining the item accesses:
>>> l = data[0]
>>> l
['Income', ['Revenue', ['IAP', 'Ads'], 'Other-Income'], 'Expenses', ['Developers', ['In-house', 'Contractors'], 'Advertising', 'Other Expense']]
>>> l[1]
['Revenue', ['IAP', 'Ads'], 'Other-Income']
>>> l[1][1]
['IAP', 'Ads']
>>> l[1][1][0]
'IAP'
Upvotes: 11
Reputation: 22992
If the indentation is a fixed amount of spaces (3 spaces here), you can simplify the calculation of the indentation level.
note: I use a StringIO to simulate a file
import io
import itertools
content = u"""\
Income
Revenue
IAP
Ads
Other-Income
Expenses
Developers
In-house
Contractors
Advertising
Other Expenses
"""
stack = []
for line in io.StringIO(content):
content = line.rstrip() # drop \n
row = content.split(" ")
stack[:] = stack[:len(row) - 1] + [row[-1]]
print("\t".join(stack))
You get:
Income
Income Revenue
Income Revenue IAP
Income Revenue Ads
Income Other-Income
Expenses
Expenses Developers
Expenses Developers In-house
Expenses Developers Contractors
Expenses Advertising
Expenses Other Expenses
EDIT: indentation not fixed
If the indentation isn't fixed (you don't always have 3 spaces) like in the example below:
content = u"""\
Income
Revenue
IAP
Ads
Other-Income
Expenses
Developers
In-house
Contractors
Advertising
Other Expenses
"""
You need to estimate the shifting at each new line:
stack = []
last_indent = u""
for line in io.StringIO(content):
indent = "".join(itertools.takewhile(lambda c: c == " ", line))
shift = 0 if indent == last_indent else (-1 if len(indent) < len(last_indent) else 1)
index = len(stack) + shift
stack[:] = stack[:index - 1] + [line.strip()]
last_indent = indent
print("\t".join(stack))
Upvotes: 7