vestland
vestland

Reputation: 61074

Dictionary: How to list every key path that contains a certain value?

Let's say I've got a nested dictionary of the form:

{'geo': {'bgcolor': 'white','lakecolor': 'white','caxis': {'gridcolor': 'white', 'linecolor': 'white',}},
 'title': {'x': 0.05},
 'yaxis': {'automargin': True,'linecolor': 'white','zerolinecolor': 'white','zerolinewidth': 2}
 }

How can you work your way through that dict and make a list of each complete key path that contains the value 'white'? Using a function defined by user jfs in the post Search for a value in a nested dictionary python lets you check whether or not 'white' occurs at least one time and also returns the path:

# dictionary
    d={'geo': {'bgcolor': 'white','lakecolor': 'white','caxis': {'gridcolor': 'white', 'linecolor': 'white',}},
    'title': {'x': 0.05},
    'yaxis': {'automargin': True,'linecolor': 'white','ticks': '','zerolinecolor': 'white','zerolinewidth': 2}
  }

# function:
def getpath(nested_dict, value, prepath=()):
    for k, v in nested_dict.items():
        path = prepath + (k,)
        if v == value: # found value
            return path
        elif hasattr(v, 'items'): # v is a dict
            p = getpath(v, value, path) # recursive call
            if p is not None:
                return p
getpath(d,'white')

# out:
('geo', 'bgcolor')

But 'white' occurs other places too, like in :

1. d['geo']['lakecolor']

2: d['geo']['caxis']['gridcolor']

3: d['yaxis']['linecolor']

How can I make sure that the function finds all paths?

I've tried applying the function above until it returns none while eliminating found paths one by one, but that quickly turned into an ugly mess.

Thank you for any suggestions!

Upvotes: 4

Views: 776

Answers (4)

Axel Andersson
Axel Andersson

Reputation: 1

I needed this functionality for traversing HDF files with h5py. This code is a slight alteration of the answer by user114332 which looks for keys instead of values, and additionally yields the needle in the result, in case it is useful to someone else.

import h5py
def find_paths(haystack, needle):
    if not isinstance(haystack, h5py.Group) and not isinstance(haystack, dict):
        return
    
    if needle in haystack:
        yield (needle,)
    
    for key, val in haystack.items():
        for subpath in find_paths(val, needle):
            yield (key, *subpath)

Execution:

sf = h5py.File("file.h5py", mode = "w")
g = sf.create_group("my group")
h = g.create_group("my2")
k = sf.create_group("two group")
l = k.create_group("my2")
a = l.create_group("my2")

for path in find_paths(sf, "my2"):
    print('found at path: ', path)

which prints the following

found at path:  ('my group', 'my2')
found at path:  ('two group', 'my2')
found at path:  ('two group', 'my2', 'my2')

Upvotes: 0

dumbass
dumbass

Reputation: 27208

This is a perfect use case to write a generator:

def find_paths(haystack, needle):
    if haystack == needle:
        yield ()
    if not isinstance(haystack, dict):
        return
    for key, val in haystack.items():
        for subpath in find_paths(val, needle):
            yield (key, *subpath)

You can use it as follows:

d = {
    'geo': {'bgcolor': 'white','lakecolor': 'white','caxis': {'gridcolor': 'white', 'linecolor': 'white',}},
    'title': {'x': 0.05},
    'yaxis': {'automargin': True,'linecolor': 'white','ticks': '','zerolinecolor': 'white','zerolinewidth': 2}
}

# you can iterate over the paths directly...
for path in find_paths(d, 'white'):
    print('found at path: ', path)

# ...or you can collect them into a list:
paths = list(find_paths(d, 'white'))
print('found at paths: ' + repr(paths))

The generator approach has the advantage that it doesn't need to create an object to keep all paths in memory at once; they can be processed one by one and immediately discarded. In this case, the memory savings would be rather modest, but in others they may be significant. Also, if a loop iterating over a generator is terminated early, the generator is not going to keep searching for more paths that would be later discarded anyway.

Upvotes: 5

JacobIRR
JacobIRR

Reputation: 8946

Returning is what makes the result incomplete. Instead of returning, use a separate list to track your paths. I'm using list cur_list here, and returning it at the very end of the loop:

d = {
  'geo': {'bgcolor': 'white',
         'caxis': {'gridcolor': 'white', 'linecolor': 'white'},
         'lakecolor': 'white'},
  'title': {'x': 0.05},
  'yaxis': {'automargin': True,
           'linecolor': 'white',
           'ticks': '',
           'zerolinecolor': 'white',
           'zerolinewidth': 2}
}

cur_list = []

def getpath(nested_dict, value, prepath=()):
    for k, v in nested_dict.items():
        path = prepath + (k,)
        if v == value: # found value
            cur_list.append(path)
        elif isinstance(v, dict): # v is a dict
            p = getpath(v, value, path, cur_list) # recursive call
            if p is not None:
                cur_list.append(p)

getpath(d,'white')
print(cur_list)


# RESULT:
# [('geo', 'bgcolor'), ('geo', 'caxis', 'gridcolor'), ('geo', 'caxis', 'linecolor'), ('geo', 'lakecolor'), ('yaxis', 'linecolor'), ('yaxis', 'zerolinecolor')]

Upvotes: 0

Jean-François Fabre
Jean-François Fabre

Reputation: 140168

just transform your function so it returns a list and don't return when something is found. Just add to/extend the list

def getpath(nested_dict, value, prepath=()):
    p = []
    for k, v in nested_dict.items():
        path = prepath + (k,)
        if v == value: # found value
            p.append(path)
        elif hasattr(v, 'items'): # v is a dict
            p += getpath(v, value, path) # recursive call
    return p

with your input data, this produces (order may vary depending on python versions where dictionaries are unordered):

[('yaxis', 'linecolor'), ('yaxis', 'zerolinecolor'), ('geo', 'lakecolor'), 
('geo', 'caxis', 'linecolor'), ('geo', 'caxis', 'gridcolor'), ('geo', 'bgcolor')]

Upvotes: 3

Related Questions