Reputation: 3119
I have a yaml file which I parse. I use a simple recursive function but does not work as I expected. If I call parse(content, 'mydrive/home/sample/aaaaa1.html')
I get the result home/sample/aaaaa1.html
. However, parse(content, 'mydrive/home/sample/sample3.html')
returns None
.
What am I doing wrong?
from ruamel.yaml import YAML as yaml
content = yaml().load(open(r'/home/doc/sample.yaml', 'r'))
def parse(content, path):
"""
Parse the YAML.
"""
for i in content:
if isinstance(i, dict):
for item in i:
if item == 'href':
if i[item] in path:
return i[item]
elif item == 'topics':
return parse(i[item], path)
elif item == 'placeholder':
pass
else:
print("I did not recognize", item)
else:
print("---- not a dictionary ----")
Here is the sample yaml:
- placeholder: Sample
- topics:
- placeholder: Sample
- topics:
- placeholder: Sample
- topics:
- href: home/sample/aaaaa1.html
- href: home/sample/aaaaa2.html
- placeholder: Sample
# Comment
- topics:
- href: home/sample/sample1.html
- href: home/sample/sample2.html
- href: home/sample/sample3.html
Upvotes: 1
Views: 2762
Reputation: 13898
Your parse()
function never hits the last branch of the topic in the tree. In particular, when it hits this line:
elif item.startswith('topics'):
return parse(i[item], path)
It'll only dive further into the inner layers but doesn't know how to get back out because you're always returning parse()
of the inner items. To demonstrate, if you add this else
line in:
if item == 'href':
if i[item] in path:
return i[item]
else: #Add this
return "I can't get out" #Add this
You'll realize your second call for sample3.html
is returning "I can't get out" because that's the end of your return chain. It's returning None
right now there's no else
if the item doesn't match the path.
A simple fix is change your topics
handling as follows:
elif item == 'topics':
result = parse(i[item], path)
if result: return result
So that you always check if the inner parse()
returns something. If it doesn't, don't return and continue on with the next item on the outer layer.
home/sample/aaaaa1.html
home/sample/sample3.html
I debugged this by adding 1/2/3 to your topics/placeholders and follow the debugger to see where the iteration stopped. It helps visualize the problem. IMHO (still a beginner) a cleaner way to do this is assign a return value in each check, but only return the value at the very end of the function so you avoid this debugging mess. BTW, thanks for this question. It's a learning process for me as well, and I learned the cautions of recursive functions. This is how I would have coded parse()
:
def parse(content, path):
"""
Parse the YAML.
"""
for i in content:
result = None # Default result to None so the return won't trigger.
if isinstance(i, dict):
for item in i:
if item == 'href':
if i[item] in path:
result = i[item] # Assign value instead of returning
elif item == 'topics':
result = parse(i[item], path) # Assign value instead of returning
elif item == 'placeholder':
pass
else:
print("I did not recognize", item)
else:
print("---- not a dictionary ----")
if result: return result # only return something if it has found a match.
I also would update the two print()
statements to actually handle the condition if it means something to your program. Printing is mostly useless I find unless you are debugging or want to monitor your console. I would either log them or return something to your program so the condition doesn't get pass unnoticed.
Upvotes: 2