Finding the max depth of a set in a dictionary

Question

I have a dictionary where the key is a string and the values of the key are a set of strings that also contain the key (word chaining). I'm having trouble finding the max depth of a graph, which would be the set with the most elements in the dictionary, and I'm try print out that max graph as well.

Right now my code prints:

{'DOG': [],
 'HIPPOPOTIMUS': [],
 'POT': ['SUPERPOT', 'HIPPOPOTIMUS'],
 'SUPERPOT': []}
1

Where 1 is my maximum dictionary depth. I was expecting the depth to be two, but there appears to be only 1 layer to the graph of 'POT'

How can I find the maximum value set from the set of keys in a dictionary?

import pprint

def dict_depth(d, depth=0):
    if not isinstance(d, dict) or not d:
        return depth
    print max(dict_depth(v, depth+1) for k, v in d.iteritems())


def main():
    for keyCheck in wordDict:
        for keyCompare in wordDict:
            if keyCheck in keyCompare:
                if keyCheck != keyCompare:
                    wordDict[keyCheck].append(keyCompare)

if __name__ == "__main__":
    #load the words into a dictionary
    wordDict = dict((x.strip(), []) for x in open("testwordlist.txt"))
    main()
    pprint.pprint (wordDict)
    dict_depth(wordDict)

testwordlist.txt:

POT
SUPERPOT
HIPPOPOTIMUS
DOG

Rob Kennedy · Accepted Answer

The "depth" of a dictionary will naturally be 1 plus the maximum depth of its entries. You've defined the depth of a non-dictionary to be zero. Since your top-level dictionary doesn't contain any dictionaries of its own, the depth of your dictionary is clearly 1. Your function reports that value correctly.

However, your function isn't written expecting the data format you're providing it. We can easily come up with inputs where the depth of substring chains is more than just one. For example:

DOG
DOGMA
DOGMATIC
DOGHOUSE
POT

Output of your current script:

{'DOG': ['DOGMATIC', 'DOGMA', 'DOGHOUSE'],
 'DOGHOUSE': [],
 'DOGMA': ['DOGMATIC'],
 'DOGMATIC': [],
 'POT': []}
1

I think you want to get 2 for that input because the longest substring chain is DOG → DOGMA → DOGMATIC, which contains two hops.

To get the depth of a dictionary as you've structured it, you want to calculate the chain length for each word. That's 1 plus the maximum chain length of each of its substrings, which gives us the following two functions:

def word_chain_length(d, w):
    if len(d[w]) == 0:
        return 0
    return 1 + max(word_chain_length(d, ww) for ww in d[w])

def dict_depth(d):
    print(max(word_chain_length(d, w) for w in d))

The word_chain_length function given here isn't particularly efficient. It may end up calculating the lengths of the same chain multiple times if a string is a substring of many words. Dynamic programming is a simple way to mitigate that, which I'll leave as an exercise.

Finding the max depth of a set in a dictionary

Answers (2)

Related Questions