Steve
Steve

Reputation: 101

How to compare nested dicts

a_standard = {
    'section1': {
        'category1': 1,
        'category2': 2
    },
    'section2': {
        'category1': 1,
        'category2': 2
    }

}

a_new = {
    'section1': {
        'category1': 1,
        'category2': 2
    },
    'section2': {
        'category1': 1,
        'category2': 3
    }

}

I want to find the difference between a_standard and a_new which is in a_new[section2][category2] difference in value being 2 and 3

Should I convert each to a set and then do difference or loop and compare the dict?

Upvotes: 3

Views: 10508

Answers (3)

Zephaniah Grunschlag
Zephaniah Grunschlag

Reputation: 1083

There is a library called deepdiff that has a lot of options, but I find it to be somewhat unintuitive.

Here's a recursive function that I often use to compute diffs during my unit tests. This goes a bit beyond what the question asks, because I take care of the case of lists being nested as well. I hope you'll find it useful.

Function definition:

from copy import deepcopy


def deep_diff(x, y, parent_key=None, exclude_keys=[], epsilon_keys=[]):
    """
    Take the deep diff of JSON-like dictionaries

    No warranties when keys, or values are None

    """
    EPSILON = 0.5
    rho = 1 - EPSILON

    if x == y:
        return None

    if parent_key in epsilon_keys:
        xfl, yfl = float_or_None(x), float_or_None(y)
        if xfl and yfl and xfl * yfl >= 0 and rho * xfl <= yfl and rho * yfl <= xfl:
            return None

    if type(x) != type(y) or type(x) not in [list, dict]:
        return x, y

    if type(x) == dict:
        d = {}
        for k in x.keys() ^ y.keys():
            if k in exclude_keys:
                continue
            if k in x:
                d[k] = (deepcopy(x[k]), None)
            else:
                d[k] = (None, deepcopy(y[k]))

        for k in x.keys() & y.keys():
            if k in exclude_keys:
                continue

            next_d = deep_diff(x[k], y[k], parent_key=k, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys)
            if next_d is None:
                continue

            d[k] = next_d

        return d if d else None

    # assume a list:
    d = [None] * max(len(x), len(y))
    flipped = False
    if len(x) > len(y):
        flipped = True
        x, y = y, x

    for i, x_val in enumerate(x):
        d[i] = deep_diff(y[i], x_val, parent_key=i, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys) if flipped else deep_diff(x_val, y[i], parent_key=i, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys)

    for i in range(len(x), len(y)):
        d[i] = (y[i], None) if flipped else (None, y[i])

    return None if all(map(lambda x: x is None, d)) else d

# We need this helper function as well:
def float_or_None(x):
    try:
        return float(x)
    except ValueError:
        return None

Usage:

>>> deep_diff(a_standard, a_new)

{'section2': {'category2': (2, 3)}}

I think the output is a little more intuitive than the other answers.

In unit tests I'll so something like:

import json

diff = deep_diff(expected_out, out, exclude_keys=["flickery1", "flickery2"])
assert diff is None, json.dumps(diff, indent=2)

Upvotes: 4

CrizR
CrizR

Reputation: 688

you can do this assuming the keys are the same:

def find_diff(dict1, dict2):
    differences = []
    for key in dict1.keys(): 
        if type(dict1[key]) is dict:
            return find_diff(dict1[key], dict2[key])
        else:
            if not dict1[key] == dict2[key]:
                differences.append((key, dict1[key], dict2[key]))
    return differences

I’m typing on my phone right now, so sorry if the syntax is a little messed up.

Upvotes: 2

Ajax1234
Ajax1234

Reputation: 71451

You can use recursion:

a_standard = {
'section1': {
    'category1': 1,
    'category2': 2
},
'section2': {
    'category1': 1,
    'category2': 2
 }

}

a_new = {
'section1': {
    'category1': 1,
    'category2': 2
},
'section2': {
    'category1': 1,
    'category2': 3
 }

}
def differences(a, b, section=None):
    return [(c, d, g, section) if all(not isinstance(i, dict) for i in [d, g]) and d != g else None if all(not isinstance(i, dict) for i in [d, g]) and d == g else differences(d, g, c) for [c, d], [h, g] in zip(a.items(), b.items())]

n = filter(None, [i for b in differences(a_standard, a_new) for i in b])

Output:

[('category2', 2, 3, 'section2')]

Which yields the key corresponding to the unequal values.

Edit: without list comprehension:

def differences(a, b, section = None):
  for [c, d], [h, g] in zip(a.items(), b.items()):
      if not isinstance(d, dict) and not isinstance(g, dict):
         if d != g:
            yield (c, d, g, section)
      else:
          for i in differences(d, g, c):
             for b in i:
               yield b
print(list(differences(a_standard, a_new)))

Output:

['category2', 2, 3, 'section2']

This solution utilizes generators (hence the yield statement), which store the yielded values on the fly, only remembering where it left off. The values can be garnered by casting the returned result as a list. yield makes it easier to accumulate the value differences and removes the need to keep an additional parameter in the function or a global variable.

Upvotes: 3

Related Questions