marw
marw

Reputation: 3119

Cant't get recursion go deeper in Yaml parsing

I have a yaml file which I parse. I use a simple recursive function but does not work as I expected. If I call parse(content, 'mydrive/home/sample/aaaaa1.html') I get the result home/sample/aaaaa1.html. However, parse(content, 'mydrive/home/sample/sample3.html') returns None.

What am I doing wrong?

from ruamel.yaml import YAML as yaml

content = yaml().load(open(r'/home/doc/sample.yaml', 'r'))

def parse(content, path):
    """
    Parse the YAML.
    """
    for i in content:
        if isinstance(i, dict):
            for item in i:
                if item == 'href':
                    if i[item] in path:
                        return i[item]
                elif item == 'topics':
                    return parse(i[item], path)
                elif item == 'placeholder':
                    pass
                else:
                    print("I did not recognize", item)
        else:
            print("---- not a dictionary ----")

Here is the sample yaml:

- placeholder: Sample
- topics:

    - placeholder: Sample
    - topics:
        - placeholder: Sample
        - topics:
            - href: home/sample/aaaaa1.html
            - href: home/sample/aaaaa2.html

    - placeholder: Sample
    # Comment
    - topics:     
        - href: home/sample/sample1.html
        - href: home/sample/sample2.html
        - href: home/sample/sample3.html

Upvotes: 1

Views: 2762

Answers (1)

r.ook
r.ook

Reputation: 13898

The problem:

Your parse() function never hits the last branch of the topic in the tree. In particular, when it hits this line:

elif item.startswith('topics'):
    return parse(i[item], path)

It'll only dive further into the inner layers but doesn't know how to get back out because you're always returning parse() of the inner items. To demonstrate, if you add this else line in:

if item == 'href':
    if i[item] in path:
        return i[item]
    else:  #Add this
        return "I can't get out" #Add this

You'll realize your second call for sample3.html is returning "I can't get out" because that's the end of your return chain. It's returning None right now there's no else if the item doesn't match the path.

The fix:

A simple fix is change your topics handling as follows:

elif item == 'topics':
    result = parse(i[item], path)
    if result: return result

So that you always check if the inner parse() returns something. If it doesn't, don't return and continue on with the next item on the outer layer.

The Output:

home/sample/aaaaa1.html
home/sample/sample3.html

My 2 cents:

I debugged this by adding 1/2/3 to your topics/placeholders and follow the debugger to see where the iteration stopped. It helps visualize the problem. IMHO (still a beginner) a cleaner way to do this is assign a return value in each check, but only return the value at the very end of the function so you avoid this debugging mess. BTW, thanks for this question. It's a learning process for me as well, and I learned the cautions of recursive functions. This is how I would have coded parse():

def parse(content, path):
    """
    Parse the YAML.
    """
    for i in content:
        result = None  # Default result to None so the return won't trigger.
        if isinstance(i, dict):
            for item in i:
                if item == 'href':
                    if i[item] in path:
                        result = i[item]  # Assign value instead of returning
                elif item == 'topics':
                    result = parse(i[item], path)  # Assign value instead of returning
                elif item == 'placeholder':
                    pass
                else:
                    print("I did not recognize", item)
        else:
            print("---- not a dictionary ----") 
        if result: return result    # only return something if it has found a match.

I also would update the two print() statements to actually handle the condition if it means something to your program. Printing is mostly useless I find unless you are debugging or want to monitor your console. I would either log them or return something to your program so the condition doesn't get pass unnoticed.

Upvotes: 2

Related Questions