fdhex
fdhex

Reputation: 2532

Deep merge dictionaries of dictionaries in Python

I need to merge multiple dictionaries, here's what I have for instance:

dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}

dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}

With A B C and D being leaves of the tree, like {"info1":"value", "info2":"value2"}

There is an unknown level(depth) of dictionaries, it could be {2:{"c":{"z":{"y":{C}}}}}

In my case it represents a directory/files structure with nodes being docs and leaves being files.

I want to merge them to obtain:

 dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

I'm not sure how I could do that easily with Python.

Upvotes: 233

Views: 161176

Answers (30)

Take Ichiru
Take Ichiru

Reputation: 76

Here is a revised version of my module: dict_deepmerge.py, which leverage the Toml's syntax but adding some extensive options and some security layers. But if you don't use pydantic (v2) then just use the typing module and replace my annotation with depth: int or maxdepth: int.

My solution is based on the diogo's answer on StackOverflow above on Mar 31, 2022, but added some custom changes with extensive options and some security layers.

Hope you are gonna like it and give some of your comments

from functools import lru_cache
from typing import Annotated, Literal, Any, Callable
from pydantic import PositiveInt, Field
from copy import deepcopy, copy

__all__ = ['deepmerge']

_immutable_types = (int, float, str, bool, type(None), tuple)
_mutable_types = (list, dict)      # Maybe to add set/frozenset in the future

_actions_l1 = Literal['override', 'bypass', 'terminate']
_copy_actions = Literal['copy', 'deepcopy']
_actions_l2 = Literal[_actions_l1, _copy_actions]
_actions_l3 = Literal[_actions_l2, 'extend-copy', 'extend-deepcopy']

# Security Limit
_max_depth: int = 6
_min_num_base_item_in_layer: int = 12
_max_num_base_item_in_layer: int = 768

@lru_cache(maxsize=_max_depth + 1)
def _max_num_items_in_depth(depth: Annotated[PositiveInt, Field(default=_max_depth // 2 + 1, le=_max_depth)]) -> int:
    return max(_min_num_base_item_in_layer, _max_num_base_item_in_layer // (4 ** depth))

# Heuristically determined maximum number of items in the dictionary
_max_num_conf: int = 100
_max_total_items_per_default_conf: int = sum(map(_max_num_items_in_depth, range(1, _max_depth + 1)))
_max_total_items_per_addition_conf: Callable[[int], int] = lambda num_args: 32 * max(num_args, _max_num_conf)


def _depth_count(a: _mutable_types) -> int:
    if isinstance(a, dict):
        return 1 + (max(map(_depth_count, a.values())) if a else 0)
    elif isinstance(a, (list, tuple, set)):
        return 1 + (max(map(_depth_count, a)) if a else 0)
    return 0


def _item_total_count(a: dict) -> int:
    if isinstance(a, dict):
        return len(a) + sum(map(_item_total_count, a.values()))
    elif isinstance(a, (list, tuple, set)):
        return len(a) + sum(map(_item_total_count, a))
    return 0


def _trigger_update(result: dict, key: Any, value: Any, trigger: _actions_l3) -> None:
    match trigger:
        case 'override':
            result[key] = value
        case 'bypass':
            pass
        case 'terminate':
            result.pop(key)
        case 'copy':
            result[key] = copy(value)
        case 'deepcopy':
            result[key] = deepcopy(value)
        case 'extend-copy':
            result[key].extend(copy(value))
        case 'extend-deepcopy':
            result[key].extend(deepcopy(value))
    return None


# =============================================================================
def _deepmerge(a: dict, b: dict, result: dict, path: list[str], /, merged_index_item: int, curdepth: int, maxdepth: int,
               not_available_immutable_action: _actions_l1, available_immutable_action: _actions_l1,
               not_available_immutable_tuple_action: _copy_actions, available_immutable_tuple_action: _copy_actions,
               not_available_mutable_action: _actions_l2, list_conflict_action: _actions_l3,
               skiperror: bool = False,) -> dict:
    # This serves as the second layer of protection to prevent we are actually going too deep.
    if curdepth >= maxdepth:
        raise RecursionError(f"The depth of the dictionary (={curdepth}) exceeds the maximum depth (={maxdepth}).")
    curdepth += 1
    max_num_items_allowed = _max_num_items_in_depth(curdepth)
    if len(a) + len(b) > 2 * max_num_items_allowed:
        raise RecursionError(f"The number of items in the dictionary exceeds twice maximum limit "
                             f"(={max_num_items_allowed}).")

    for bkey, bvalue in b.items():
        abkey_value = a[bkey]
        # bkey = str(bkey)                # Enforce the key to be a string. This is expected to be
        path.append(bkey)

        # If the key doesn't exist in A, add the B element to A. This means that all values in B are not existed in A
        if bkey not in a:
            if isinstance(bvalue, _immutable_types):
                _trigger_update(result, bkey, bvalue, not_available_immutable_action)
            elif isinstance(bvalue, _mutable_types):
                _trigger_update(result, bkey, bvalue, not_available_mutable_action)

            elif not skiperror:
                raise TypeError(f"Conflict at {'->'.join(path[0:curdepth])} in the #{merged_index_item} configuration.")

        # We don't care scenario where A have value but B doesn't have it.
        else:
            # If both are immutable types, perform the action of :var:`immutable_action` on result with the value in B
            if isinstance(abkey_value, _immutable_types) and isinstance(bvalue, _immutable_types):
                _trigger_update(result, bkey, bvalue, available_immutable_action)


            elif isinstance(abkey_value, _immutable_types) and isinstance(bvalue, _mutable_types):
                # I am not sure if we have JSON reference here
                if not skiperror:
                    raise TypeError(f"Conflict at {'->'.join(path[0:curdepth])} in the #{merged_index_item} "
                                    f"configuration as value in both side are heterogeneous of type")
                else:
                    # result[bkey] = deepcopy(bvalue)
                    pass

            elif isinstance(abkey_value, _mutable_types) and isinstance(bvalue, _immutable_types):
                if not skiperror:
                    raise TypeError(f"Conflict at {'->'.join(path[0:curdepth])} in the #{merged_index_item} "
                                    f"configuration as value in both side are heterogeneous of type")
                else:
                    # result[bkey] = deepcopy(bvalue)
                    pass

            elif isinstance(abkey_value, _mutable_types) and isinstance(bvalue, _mutable_types):
                # If the key value is a dict, both in A and in B, merge the dicts
                if isinstance(abkey_value, dict) and isinstance(bvalue, dict):
                    _deepmerge(abkey_value, bvalue, result[bkey], deepcopy(path), merged_index_item=merged_index_item,
                               curdepth=curdepth, maxdepth=maxdepth, skiperror=skiperror,
                               not_available_immutable_action=not_available_immutable_action,
                               available_immutable_action=available_immutable_action,
                               not_available_immutable_tuple_action=not_available_immutable_tuple_action,
                               available_immutable_tuple_action=available_immutable_tuple_action,
                               not_available_mutable_action=not_available_mutable_action,
                               list_conflict_action=list_conflict_action)
                elif isinstance(abkey_value, list) and isinstance(bvalue, list):
                    _trigger_update(result, bkey, bvalue, list_conflict_action)

                elif not skiperror:
                    raise TypeError(f"Conflict at {'->'.join(path[0:curdepth])} in the #{merged_index_item} "
                                    f"configuration as value in both side are heterogeneous or unsupported of type(s)")

            # If the key value is the same in A and in B, but this can ends up with [1, 2] = [1, 2]
            elif abkey_value == bvalue:
                pass

            # This is the edge-case where the value in A and B are not the same
            elif not skiperror:
                raise Exception(f"Conflict at {'->'.join(path[0:curdepth])} in the #{merged_index_item} configuration."
                                f" It can be the result of edge-case or non-supported type")

        # Pop the last value in the path
        path.pop()

    return result


def deepmerge(a: dict, *args: dict, maxdepth: Annotated[PositiveInt, Field(default=_max_depth // 2 + 1, le=_max_depth)],
              not_available_immutable_action: _actions_l1, available_immutable_action: _actions_l1,
              not_available_immutable_tuple_action: _copy_actions, available_immutable_tuple_action: _copy_actions,
              not_available_mutable_action: _actions_l2, list_conflict_action: _actions_l3,
              skiperror: bool = False,) -> dict:
    """
    Recursively merges and update two dictionaries. The result is always a new deepcopy of the dictionaries. Note
    that the function is not designed to handle circular references, and unable to prevent memory overflow when
    meeting unexpected large dictionaries.

    Parameters:
    ----------

    a: dict
        The first dictionary to be merged. This dictionary is (usually) the default configuration.

    *args: dict
        The other dictionaries to be merged. These dictionaries are (usually) the custom configuration that overrides
        the default configuration. If there are more than one custom configuration, they will be merged in order thus
        if there are conflicts, the last custom configuration will override the previous custom configuration.

        a (dict): The first dictionary.
        b (dict): The second dictionary.

    Keyword Parameters:
    ------------------

    maxdepth: PositiveInt
        The maximum depth of the dictionary to be merged. The default value is 4 and the maximum value is 6. We don't
        recommend to set the value to 8 since it may cause a stack overflow error. Also, setting max depth make it
        easier to have non-regression performance of path traversal

    skiperror: bool
        If True, the function will skip the error when it encounters a conflict. The default value is False means
        that no action is taken when a conflict is encountered, unsupported/heterogeneous type is found, or the
        dictionary is too deep.

    not_available_immutable_action: Literal['override', 'bypass', 'terminate']
        The action to be taken when the key is NOT available in the base/prior configuration and action is performed
        on immutable types.
        - 'override' (default): The value in the custom configuration will override the default configuration.
        - 'bypass': The value in the custom configuration will be ignored.
        - 'terminate': The key will be removed from the default configuration.

    available_immutable_action: Literal['override', 'bypass', 'terminate']
        The action to be taken when the key is available in the base/prior configuration and action is performed
        on immutable types.
        - 'override' (default): The value in the custom configuration will override the default configuration.
        - 'bypass': The value in the custom configuration will be ignored.
        - 'terminate': The key will be removed from the default configuration.

    not_available_immutable_tuple_action: Literal['copy', 'deepcopy']
        The action to be taken when the key is NOT available in the base/prior configuration and action is performed
        on immutable types.
        - 'copy' (default): The value will be copied.
        - 'deepcopy': The value will be deep copied.

    available_immutable_tuple_action: Literal['copy', 'deepcopy']
        The action to be taken when the key is available in the base/prior configuration and action is performed
        on tuple datatype.
        - 'copy' (default): The value will be copied.
        - 'deepcopy': The value will be deep copied.

    not_available_mutable_action: Literal['copy', 'deepcopy', 'bypass', 'terminate']
        The action to be taken when the key is NOT available in the base/prior configuration and action is performed
        on mutable types.
        - 'copy' (default): The value will be copied.
        - 'deepcopy': The value will be deep copied.
        - 'bypass': The value in the custom configuration will be ignored.
        - 'terminate': The key will be removed from the default configuration.

    list_conflict_action: Literal['extend-copy', 'extend-deepcopy', 'override-copy', 'override-deepcopy', 'terminate', 'bypass']
        The action to be taken when the key is available in the base/prior configuration and action is performed
        on list datatype.
        - 'extend-copy': The value in the custom configuration will be extended and copied.
        - 'extend-deepcopy': The value in the custom configuration will be extended and deep copied.
        - 'override-copy': The value in the custom configuration will override and copied.
        - 'override-deepcopy': The value in the custom configuration will override and deep copied.
        - 'terminate': The key will be removed from the default configuration.
        - 'bypass': The value in the custom configuration will be ignored.


    Returns:
    -------
        dict: The merged dictionary.


    Errors:
    ------
    RecursionError: The depth of the dictionary exceeds the maximum depth.
    TypeError: The value in both side are heterogeneous or unsupported of type(s).

    """
    if not args:
        return deepcopy(a)
    
    if not (1 <= maxdepth <= _max_depth):
        raise ValueError(f"The depth of the dictionary exceeds the maximum depth allowed (={_max_depth}).")

    if (num_args:= len(args)) > _max_num_conf:
        raise ValueError(f"The number of dictionaries to be merged exceeds the maximum limit (={_max_num_conf}).")

    if (a_maxdepth:= _depth_count(a)) > maxdepth:
        raise RecursionError(f"The depth of the first map (={a_maxdepth}) exceeds the maximum depth (={maxdepth})")
    if (a_maxitem:= _item_total_count(a)) > _max_total_items_per_default_conf:
        raise RecursionError(f"The number of items in the first map (={a_maxitem}) exceeds the maximum "
                             f"limit (={_max_total_items_per_default_conf}).")
    arg_maxitem: int = 0
    for arg in args:
        if (arg_maxdepth := _depth_count(arg)) > maxdepth:
            raise RecursionError(f"The depth of the map (={arg_maxdepth}) exceeds the maximum depth (={maxdepth}).")
        arg_maxitem += _item_total_count(arg)
    else:
        if arg_maxitem > _max_total_items_per_addition_conf(num_args):
            raise RecursionError(f"The number of items in the map (={arg_maxitem}) exceeds the maximum "
                                 f"limit (={_max_total_items_per_addition_conf(num_args)}).")

    result = deepcopy(a)
    for idx, arg in enumerate(args):
        result = _deepmerge(result, arg, result, [], merged_index_item=idx, curdepth=0, maxdepth=maxdepth,
                            skiperror=skiperror, not_available_immutable_action=not_available_immutable_action,
                            available_immutable_action=available_immutable_action,
                            not_available_immutable_tuple_action=not_available_immutable_tuple_action,
                            available_immutable_tuple_action=available_immutable_tuple_action,
                            not_available_mutable_action=not_available_mutable_action,
                            list_conflict_action=list_conflict_action)
    return result


Upvotes: 1

Ultra
Ultra

Reputation: 83

A functional programming style solution

__coalesce = lambda *arg: reduce(lambda x, y: x if x is not None else y, arg)
__allDictKeys = lambda dict1, dict2: set(dict1.keys()).union(dict2.keys())
__isDict = lambda k, dictionary: isinstance(dictionary.get(k), dict)

def __mapFnForDeepMerge(k, top_dict, bottom_dict) -> ():
    if not(__isDict(k, top_dict) or __isDict(k, bottom_dict)): # base case
        return (k, __coalesce(top_dict.get(k), bottom_dict.get(k)))
    elif __isDict(k, top_dict) and __isDict(k, bottom_dict): # merge both dict
        return (k, dict(deepMergeDict(top_dict[k], bottom_dict[k])))
    else:
        return (k, __coalesce(top_dict.get(k), bottom_dict.get(k)))

def __deepMergeDict(top_dict, bottom_dict) -> dict:
    top_dict = copy.deepcopy(top_dict) # prevent side effects
    bottom_dict = copy.deepcopy(bottom_dict) # prevent side effects
    allKeys = __allDictKeys(top_dict, bottom_dict)
    fn = lambda k: __mapFnForDeepMerge(k, top_dict, bottom_dict)
    result = map(fn, allKeys)
    return result

def deepMergeDict(top_dict, bottom_dict) -> dict:
    assert isinstance(top_dict, dict)
    assert isinstance(bottom_dict, dict)
    return __deepMergeDict(top_dict, bottom_dict)

assert (__allDictKeys({"a":1}, {"b":2})) == {'a', 'b'}
assert (__allDictKeys({"a":1}, {"a":2, "b":"1"})) == {'a', 'b'}
assert (dict(deepMergeDict( {"a":0}, {"a":1} ))) == {'a': 0}
assert (dict(deepMergeDict( {"a":1}, {"a":0} ))) == {'a': 1}
assert (dict(deepMergeDict( {"b":0}, {"a":1} ))) == {'a': 1, 'b': 0}
assert (dict(deepMergeDict( {"a":1, "c":{"e":1}, "d":{}, "g":{"a":1}, "h":1}, {"a":0, "c":{"f":1}, "e":{}, "g":0, "h":{"a":1}} ) )) == {'g': {'a': 1}, 'c': {'e': 1, 'f': 1}, 'h': 1, 'a': 1, 'e': {}, 'd': {}}

Upvotes: 0

sachin nagpal
sachin nagpal

Reputation: 11

This is a recursive solution, similar to the ones written above, but it doesn't raise any exception and simply merge the two dicts.

def merge(ans: dict,  a:dict ):
keys = a.keys()
for key in keys:
    if key in ans:
        ans[key] = merge(ans[key], a[key])
        return ans
    else:
        ans[key] = a[key]
return ans

Upvotes: 1

Asclepius
Asclepius

Reputation: 63282

The merge function below is is a more professional version of the answer by Ali as it avoids wastefully getting a value more than once. It works in-place.

The merge_new function below is not in-place. It returns a new dict. It does not rely on copy.deepcopy.

def merge(base: dict, update: dict) -> None:
    """Recursively merge `update` into `base` in-place."""
    for k, update_v in update.items():
        base_v = base.get(k)
        if isinstance(base_v, dict) and isinstance(update_v, dict):
            merge(base_v, update_v)
        else:
            base[k] = update_v

def merge_new(base: dict, update: dict) -> dict:
    """Return the updated result after recursively merging `update` into `base`."""
    result = base.copy()
    for k, update_v in update.items():
        base_v = result.get(k)
        if isinstance(base_v, dict) and isinstance(update_v, dict):
            result[k] = merge_new(base_v, update_v)
        else:
            result[k] = update_v
    return result

Test case:

test_data_base = {
    'a': 1,
    'b': {'c': 1, 'd': 2},
    'c': {'d': {'e': 0, 'f': 1, 'p': {'q': 4}}},
    'x': 0,
    'y': {'x': 3},
}

test_data_update = {
    'a': 9,
    'b': {'d': 3, 'e': 3},
    'c': {'d': {'e': 1, 'g': 8, 'p': {'r': 5, 's': 6}}, 'h': 7},
    'd': 6,
    'e': {'f': 10, 'g': 10},
}

test_expected_updated_data = {
    'a': 9,
    'b': {'c': 1, 'd': 3, 'e': 3},
    'c': {'d': {'e': 1, 'f': 1, 'p': {'q': 4, 'r': 5, 's': 6}, 'g': 8}, 'h': 7},
    'x': 0,
    'y': {'x': 3},
    'd': 6,
    'e': {'f': 10, 'g': 10},
}

# Test merge_new (not in-place)
import copy
test_data_base_copy = copy.deepcopy(test_data_base)
test_actual_updated_data = merge_new(test_data_base, test_data_update)
assert(test_actual_updated_data == test_expected_updated_data)
assert(test_data_base == test_data_base_copy)

# Test merge in-place
merge(test_data_base, test_data_update)
assert(test_data_base == test_expected_updated_data)

Upvotes: 0

Ali Sadeghi Ardestani
Ali Sadeghi Ardestani

Reputation: 31

The following function merges b into a.

def mergedicts(a, b):
    for key in b:
        if isinstance(a.get(key), dict) or isinstance(b.get(key), dict):
            mergedicts(a[key], b[key])
        else:
            a[key] = b[key]

Upvotes: 2

wong steve
wong steve

Reputation: 1

class Utils:
    """
    >>> a = { 'first' : { 'all_rows' : { 'pass' : 'dog', 'number' : '1' } } }
    >>> b = { 'first' : { 'all_rows' : { 'fail' : 'cat', 'number' : '5' } } }
    >>> Utils.merge_dict(b, a) == { 'first' : { 'all_rows' : { 'pass' : 'dog', 'fail' : 'cat', 'number' : '5' } } }
    True

    >>> main = {'a': {'b': {'test': 'bug'}, 'c': 'C'}}
    >>> suply = {'a': {'b': 2, 'd': 'D', 'c': {'test': 'bug2'}}}
    >>> Utils.merge_dict(main, suply) == {'a': {'b': {'test': 'bug'}, 'c': 'C', 'd': 'D'}}
    True

    """

    @staticmethod
    def merge_dict(main, suply):
        for key, value in suply.items():
            if key in main:
                if isinstance(main[key], dict):
                    if isinstance(value, dict):
                        Utils.merge_dict(main[key], value)
                    else:
                        pass
                else:
                    pass
            else:
                main[key] = value
        return main

if __name__ == '__main__':
    import doctest
    doctest.testmod()

Upvotes: 0

Tadeck
Tadeck

Reputation: 137320

This should help in merging all items from dict2 into dict1:

for item in dict2:
    if item in dict1:
        for leaf in dict2[item]:
            dict1[item][leaf] = dict2[item][leaf]
    else:
        dict1[item] = dict2[item]

Please test it and tell us whether this is what you wanted.

EDIT:

The above mentioned solution merges only one level, but correctly solves the example given by OP. To merge multiple levels, recursion should be used.

Upvotes: -1

user16779014
user16779014

Reputation:

This is a solution I made that recursively merges dictionaries. The first dictionary passed to the function is the master dictionary - values in it will overwrite the values in the same key in the second dictionary.

def merge(dict1: dict, dict2: dict) -> dict:
    merged = dict1

    for key in dict2:
        if type(dict2[key]) == dict:
            merged[key] = merge(dict1[key] if key in dict1 else {}, dict2[key])
        else:
            if key not in dict1.keys():
                merged[key] = dict2[key]

    return merged

Upvotes: 1

mateor
mateor

Reputation: 1369

I had two dictionaries (a and b) which could each contain any number of nested dictionaries. I wanted to recursively merge them, with b taking precedence over a.

Considering the nested dictionaries as trees, what I wanted was:

  • To update a so that every path to every leaf in b would be represented in a
  • To overwrite subtrees of a if a leaf is found in the corresponding path in b
    • Maintain the invariant that all b leaf nodes remain leafs.

The existing answers were a little complicated for my taste and left some details on the shelf. I implemented the following, which passes unit tests for my data set.

  def merge_map(a, b):
    if not isinstance(a, dict) or not isinstance(b, dict):
      return b

    for key in b:
      a[key] = merge_map(a[key], b[key]) if key in a else b[key]
    return a

Example (formatted for clarity):

 a = {
    1 : {'a': 'red', 
         'b': {'blue': 'fish', 'yellow': 'bear' },
         'c': { 'orange': 'dog'},
    },
    2 : {'d': 'green'},
    3: 'e'
  }

  b = {
    1 : {'b': 'white'},
    2 : {'d': 'black'},
    3: 'e'
  }


  >>> merge_map(a, b)
  {1: {'a': 'red', 
       'b': 'white',
       'c': {'orange': 'dog'},},
   2: {'d': 'black'},
   3: 'e'}

The paths in b that needed to be maintained were:

  • 1 -> 'b' -> 'white'
  • 2 -> 'd' -> 'black'
  • 3 -> 'e'.

a had the unique and non-conflicting paths of:

  • 1 -> 'a' -> 'red'
  • 1 -> 'c' -> 'orange' -> 'dog'

so they are still represented in the merged map.

Upvotes: 1

andrew cooke
andrew cooke

Reputation: 46882

This is actually quite tricky - particularly if you want a useful error message when things are inconsistent, while correctly accepting duplicate but consistent entries (something no other answer here does..)

Assuming you don't have huge numbers of entries, a recursive function is easiest:

def merge(a: dict, b: dict, path=[]):
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] != b[key]:
                raise Exception('Conflict at ' + '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

# works
print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))
# has conflict
merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

note that this mutates a - the contents of b are added to a (which is also returned). If you want to keep a you could call it like merge(dict(a), b).

agf pointed out (below) that you may have more than two dicts, in which case you can use:

from functools import reduce
reduce(merge, [dict1, dict2, dict3...])

where everything will be added to dict1.

Note: I edited my initial answer to mutate the first argument; that makes the "reduce" easier to explain

Upvotes: 220

themeasure43
themeasure43

Reputation: 51

If you are fine with giving values in the second dictionary priority over values in the first, then this can be done as easily as the following.

def merge_dict(bottom: dict, top: dict) -> dict:

    ret = {}

    for _tmp in (bottom, top):
        for k, v in _tmp.items():
            if isinstance(v, dict):
                if k not in ret:
                    ret[k] = v
                else:
                    ret[k] = merge_dict(ret[k], v)
            else:
                ret[k] = _tmp[k]
    return ret

Example:

d_bottom = {
    'A': 'bottom-A',
    'B': 'bottom-B',
    'D': {
        'DA': 'bottom-DA',
        'DB': 'bottom-DB',
        'DC': {
            "DCA": "bottom-DCA",
            "DCB": "bottom-DCB",
            "DCC": "bottom-DCC",
            "DCD": {
                "DCDA": 'bottom-DCDA',
                "DCDB": 'bottom-DCDB'
            }
        }
    }
}

d_top = {
    'A': 'top-A',
    'B': 'top-B',
    'D': {
        'DA': 'top-DA',
        'DB': 'top-DB',
        'DC': {
            'DCA': 'top-DCA',
            "DCD": {
                "DCDA": "top-DCDA"
            }
        },
        'DD': 'top-DD'
    },
    'C': {
        'CA': 'top-CA',
        'CB': {
            'CBA': 'top-CBA',
            'CBB': 'top-CBB',
            'CBC': {
                'CBCA': 'top-CBCA',
                'CBCB': {
                    'CBCBA': 'top-CBCBA'
                }
            }
        }
    }
}

print(json.dumps(merge_dict(d_bottom, d_top), indent=4))

Output:

{
    "A": "top-A",
    "B": "top-B",
    "D": {
        "DA": "top-DA",
        "DB": "top-DB",
        "DC": {
            "DCA": "top-DCA",
            "DCB": "bottom-DCB",
            "DCC": "bottom-DCC",
            "DCD": {
                "DCDA": "top-DCDA",
                "DCDB": "bottom-DCDB"
            }
        },
        "DD": "top-DD"
    },
    "C": {
        "CA": "top-CA",
        "CB": {
            "CBA": "top-CBA",
            "CBB": "top-CBB",
            "CBC": {
                "CBCA": "top-CBCA",
                "CBCB": {
                    "CBCBA": "top-CBCBA"
                }
            }
        }
    }
}

You can use it with functools.reduce() if you have more than two dictionaries to merge.

Upvotes: 1

Hans Bouwmeester
Hans Bouwmeester

Reputation: 1519

Short-n-sweet:

from collections.abc import MutableMapping as Map

def nested_update(d, v):
    """
    Nested update of dict-like 'd' with dict-like 'v'.
    """

    for key in v:
        if key in d and isinstance(d[key], Map) and isinstance(v[key], Map):
            nested_update(d[key], v[key])
        else:
            d[key] = v[key]

This works like (and is build on) Python's dict.update method. It returns None (you can always add return d if you prefer) as it updates dict d in-place. Keys in v will overwrite any existing keys in d (it does not try to interpret the dict's contents).

It will also work for other ("dict-like") mappings.

Example:

people = {'pete': {'gender': 'male'}, 'mary': {'age': 34}}
nested_update(people, {'pete': {'age': 41}})

# Pete's age was merged in
print(people)
{'pete': {'gender': 'male', 'age': 41}, 'mary': {'age': 34}}

Where Python's regular dict.update method yields:

people = {'pete': {'gender': 'male'}, 'mary': {'age': 34}}
people.update({'pete': {'age': 41}})

# We lost Pete's gender here!
print(people) 
{'pete': {'age': 41}, 'mary': {'age': 34}}

Upvotes: 8

Soudipta Dutta
Soudipta Dutta

Reputation: 2122

def m(a,b):
    aa = {
        k : dict(a.get(k,{}), **v) for k,v in b.items()
        }
    aap = print(aa)
    return aap

d1 = {1:{"a":"A"}, 2:{"b":"B"}}

d2 = {2:{"c":"C"}, 3:{"d":"D"}}

dict1 = {1:{"a":{1}}, 2:{"b":{2}}}

dict2 = {2:{"c":{222}}, 3:{"d":{3}}}

m(d1,d2)

m(dict1,dict2)

"""
Output :

{2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}}


{2: {'b': {2}, 'c': {222}}, 3: {'d': {3}}}

"""

Upvotes: 0

user15964
user15964

Reputation: 2639

You can use the merge function from the toolz package, for example:

>>> import toolz
>>> dict1 = {1: {'a': 'A'}, 2: {'b': 'B'}}
>>> dict2 = {2: {'c': 'C'}, 3: {'d': 'D'}}
>>> toolz.merge_with(toolz.merge, dict1, dict2)
{1: {'a': 'A'}, 2: {'c': 'C'}, 3: {'d': 'D'}}

Upvotes: 3

Karl Knechtel
Karl Knechtel

Reputation: 61509

As noted in many other answers, a recursive algorithm makes the most sense here. In general, when working with recursion, it is preferable to create new values rather than trying to modify any input data structure.

We need to define what happens at each merge step. If both inputs are dictionaries, this is easy: we copy across unique keys from each side, and recursively merge the values of the duplicated keys. It's the base cases that cause a problem. It will be easier to understand the logic if we pull out a separate function for that. As a placeholder, we could just wrap the two values in a tuple:

def merge_leaves(x, y):
    return (x, y)

Now the core of our logic looks like:

def merge(x, y):
    if not(isinstance(x, dict) and isinstance(y, dict)):
        return merge_leaves(x, y)
    x_keys, y_keys = x.keys(), y.keys()
    result = { k: merge(x[k], y[k]) for k in x_keys & y_keys }
    result.update({k: x[k] for k in x_keys - y_keys})
    result.update({k: y[k] for k in y_keys - x_keys})
    return result

Let's test it:

>>> x = {'a': {'b': 'c', 'd': 'e'}, 'f': 1, 'g': {'h', 'i'}, 'j': None}
>>> y = {'a': {'d': 'e', 'h': 'i'}, 'f': {'b': 'c'}, 'g': 1, 'k': None}
>>> merge(x, y)
{'f': (1, {'b': 'c'}), 'g': ({'h', 'i'}, 1), 'a': {'d': ('e', 'e'), 'b': 'c', 'h': 'i'}, 'j': None, 'k': None}
>>> x # The originals are unmodified.
{'a': {'b': 'c', 'd': 'e'}, 'f': 1, 'g': {'h', 'i'}, 'j': None}
>>> y
{'a': {'d': 'e', 'h': 'i'}, 'f': {'b': 'c'}, 'g': 1, 'k': None}

We can easily modify the leaf-merging rule, for example:

def merge_leaves(x, y):
    try:
        return x + y
    except TypeError:
        return Ellipsis

and observe the effects:

>>> merge(x, y)
{'f': Ellipsis, 'g': Ellipsis, 'a': {'d': 'ee', 'b': 'c', 'h': 'i'}, 'j': None, 'k': None}

We could also potentially clean this up by using a third-party library to dispatch based on the type of the inputs. For example, using multipledispatch, we could do things like:

@dispatch(dict, dict)
def merge(x, y):
    x_keys, y_keys = x.keys(), y.keys()
    result = { k: merge(x[k], y[k]) for k in x_keys & y_keys }
    result.update({k: x[k] for k in x_keys - y_keys})
    result.update({k: y[k] for k in y_keys - x_keys})
    return result

@dispatch(str, str)
def merge(x, y):
    return x + y

@dispatch(tuple, tuple)
def merge(x, y):
    return x + y

@dispatch(list, list)
def merge(x, y):
    return x + y

@dispatch(int, int):
def merge(x, y):
    raise ValueError("integer value conflict")

@dispatch(object, object):
    return (x, y)

This allows us to handle various combinations of leaf-type special cases without writing our own type checking, and also replaces the type check in the main recursive function.

Upvotes: 3

diogo
diogo

Reputation: 771

Returning a merge without affecting the input dictionaries.

def _merge_dicts(dictA: Dict = {}, dictB: Dict = {}) -> Dict:
    # it suffices to pass as an argument a clone of `dictA`
    return _merge_dicts_aux(dictA, dictB, copy(dictA))


def _merge_dicts_aux(dictA: Dict = {}, dictB: Dict = {}, result: Dict = {}, path: List[str] = None) -> Dict:

    # conflict path, None if none
    if path is None:
        path = []

    for key in dictB:

        # if the key doesn't exist in A, add the B element to A
        if key not in dictA:
            result[key] = dictB[key]

        else:
            # if the key value is a dict, both in A and in B, merge the dicts
            if isinstance(dictA[key], dict) and isinstance(dictB[key], dict):
                _merge_dicts_aux(dictA[key], dictB[key], result[key], path + [str(key)])

            # if the key value is the same in A and in B, ignore
            elif dictA[key] == dictB[key]:
                pass

            # if the key value differs in A and in B, raise error
            else:
                err: str = f"Conflict at {'.'.join(path + [str(key)])}"
                raise Exception(err)

    return result

inspired by @andrew cooke's solution

Upvotes: 1

user176105
user176105

Reputation: 19

I have not extensively tested this, so, your feedback is encouraged.

from collections import defaultdict

dict1 = defaultdict(list)

dict2= defaultdict(list)

dict3= defaultdict(list)


dict1= dict(zip(Keys[ ],values[ ]))

dict2 = dict(zip(Keys[ ],values[ ]))


def mergeDict(dict1, dict2):

    dict3 = {**dict1, **dict2}

    for key, value in dict3.items():

        if key in dict1 and key in dict2:

           dict3[key] = [value , dict1[key]]

    return dict3

dict3 = mergeDict(dict1, dict2)

#sort keys alphabetically.

dict3.keys()

Merge two dictionaries and add values of common keys

Upvotes: 0

Aaron Hall
Aaron Hall

Reputation: 394965

Dictionaries of dictionaries merge

As this is the canonical question (in spite of certain non-generalities) I'm providing the canonical Pythonic approach to solving this issue.

Simplest Case: "leaves are nested dicts that end in empty dicts":

d1 = {'a': {1: {'foo': {}}, 2: {}}}
d2 = {'a': {1: {}, 2: {'bar': {}}}}
d3 = {'b': {3: {'baz': {}}}}
d4 = {'a': {1: {'quux': {}}}}

This is the simplest case for recursion, and I would recommend two naive approaches:

def rec_merge1(d1, d2):
    '''return new merged dict of dicts'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge1(v, d2[k])
    d3 = d1.copy()
    d3.update(d2)
    return d3

def rec_merge2(d1, d2):
    '''update first dict with second recursively'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge2(v, d2[k])
    d1.update(d2)
    return d1

I believe I would prefer the second to the first, but keep in mind that the original state of the first would have to be rebuilt from its origin. Here's the usage:

>>> from functools import reduce # only required for Python 3.
>>> reduce(rec_merge1, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
>>> reduce(rec_merge2, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

Complex Case: "leaves are of any other type:"

So if they end in dicts, it's a simple case of merging the end empty dicts. If not, it's not so trivial. If strings, how do you merge them? Sets can be updated similarly, so we could give that treatment, but we lose the order in which they were merged. So does order matter?

So in lieu of more information, the simplest approach will be to give them the standard update treatment if both values are not dicts: i.e. the second dict's value will overwrite the first, even if the second dict's value is None and the first's value is a dict with a lot of info.

d1 = {'a': {1: 'foo', 2: None}}
d2 = {'a': {1: None, 2: 'bar'}}
d3 = {'b': {3: 'baz'}}
d4 = {'a': {1: 'quux'}}

from collections.abc import MutableMapping

def rec_merge(d1, d2):
    '''
    Update two dicts of dicts recursively, 
    if either mapping has leaves that are non-dicts, 
    the second's leaf overwrites the first's.
    '''
    for k, v in d1.items():
        if k in d2:
            # this next check is the only difference!
            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                d2[k] = rec_merge(v, d2[k])
            # we could further check types and merge as appropriate here.
    d3 = d1.copy()
    d3.update(d2)
    return d3

And now

from functools import reduce
reduce(rec_merge, (d1, d2, d3, d4))

returns

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

Application to the original question:

I've had to remove the curly braces around the letters and put them in single quotes for this to be legit Python (else they would be set literals in Python 2.7+) as well as append a missing brace:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

and rec_merge(dict1, dict2) now returns:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Which matches the desired outcome of the original question (after changing, e.g. the {A} to 'A'.)

Upvotes: 28

Sascha
Sascha

Reputation: 219

Overview

The following approach subdivides the problem of a deep merge of dicts into:

  1. A parameterized shallow merge function merge(f)(a,b) that uses a function f to merge two dicts a and b

  2. A recursive merger function f to be used together with merge


Implementation

A function for merging two (non nested) dicts can be written in a lot of ways. I personally like

def merge(f):
    def merge(a,b): 
        keys = a.keys() | b.keys()
        return {key:f(a.get(key), b.get(key)) for key in keys}
    return merge

A nice way of defining an appropriate recursive merger function f is using multipledispatch which allows to define functions that evaluate along different paths depending on the type of their arguments.

from multipledispatch import dispatch

#for anything that is not a dict return
@dispatch(object, object)
def f(a, b):
    return b if b is not None else a

#for dicts recurse 
@dispatch(dict, dict)
def f(a,b):
    return merge(f)(a,b)

Example

To merge two nested dicts simply use merge(f) e.g.:

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
merge(f)(dict1, dict2)
#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

Notes:

The advantages of this approach are:

  • The function is build from smaller functions that each do a single thing which makes the code simpler to reason about and test

  • The behaviour is not hard-coded but can be changed and extended as needed which improves code reuse (see example below).


Customization

Some answers also considered dicts that contain lists e.g. of other (potentially nested) dicts. In this case one might want map over the lists and merge them based on position. This can be done by adding another definition to the merger function f:

import itertools
@dispatch(list, list)
def f(a,b):
    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

Upvotes: 5

Osiloke
Osiloke

Reputation: 269

Based on @andrew cooke. This version handles nested lists of dicts and also allows the option to update the values

def merge(a, b, path=None, update=True):
    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            elif isinstance(a[key], list) and isinstance(b[key], list):
                for idx, val in enumerate(b[key]):
                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)
            elif update:
                a[key] = b[key]
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

Upvotes: 20

Alon G
Alon G

Reputation: 3373

I have an iterative solution - works much much better with big dicts & a lot of them (for example jsons etc):

import collections


def merge_dict_with_subdicts(dict1: dict, dict2: dict) -> dict:
    """
    similar behaviour to builtin dict.update - but knows how to handle nested dicts
    """
    q = collections.deque([(dict1, dict2)])
    while len(q) > 0:
        d1, d2 = q.pop()
        for k, v in d2.items():
            if k in d1 and isinstance(d1[k], dict) and isinstance(v, dict):
                q.append((d1[k], v))
            else:
                d1[k] = v

    return dict1

note that this will use the value in d2 to override d1, in case they are not both dicts. (same as python's dict.update())

some tests:

def test_deep_update():
    d = dict()
    merge_dict_with_subdicts(d, {"a": 4})
    assert d == {"a": 4}

    new_dict = {
        "b": {
            "c": {
                "d": 6
            }
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 4,
        "b": {
            "c": {
                "d": 6
            }
        }
    }

    new_dict = {
        "a": 3,
        "b": {
            "f": 7
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 3,
        "b": {
            "c": {
                "d": 6
            },
            "f": 7
        }
    }

    # test a case where one of the dicts has dict as value and the other has something else
    new_dict = {
        'a': {
            'b': 4
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d['a']['b'] == 4

I've tested with around ~1200 dicts - this method took 0.4 seconds, while the recursive solution took ~2.5 seconds.

Upvotes: 3

kemri
kemri

Reputation: 179

How about another answer?!? This one also avoids mutation/side effects:

def merge(dict1, dict2):
    output = {}

    # adds keys from `dict1` if they do not exist in `dict2` and vice-versa
    intersection = {**dict2, **dict1}

    for k_intersect, v_intersect in intersection.items():
        if k_intersect not in dict1:
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = v_dict2

        elif k_intersect not in dict2:
            output[k_intersect] = v_intersect

        elif isinstance(v_intersect, dict):
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = merge(v_intersect, v_dict2)

        else:
            output[k_intersect] = v_intersect

    return output

dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

assert dict3 == merge(dict1, dict2)

Upvotes: 1

conmak
conmak

Reputation: 1470

And just another slight variation:

Here is a pure python3 set based deep update function. It updates nested dictionaries by looping through one level at a time and calls itself to update each next level of dictionary values:

def deep_update(dict_original, dict_update):
    if isinstance(dict_original, dict) and isinstance(dict_update, dict):
        output=dict(dict_original)
        keys_original=set(dict_original.keys())
        keys_update=set(dict_update.keys())
        similar_keys=keys_original.intersection(keys_update)
        similar_dict={key:deep_update(dict_original[key], dict_update[key]) for key in similar_keys}
        new_keys=keys_update.difference(keys_original)
        new_dict={key:dict_update[key] for key in new_keys}
        output.update(similar_dict)
        output.update(new_dict)
        return output
    else:
        return dict_update

A simple example:

x={'a':{'b':{'c':1, 'd':1}}}
y={'a':{'b':{'d':2, 'e':2}}, 'f':2}

print(deep_update(x, y))
>>> {'a': {'b': {'c': 1, 'd': 2, 'e': 2}}, 'f': 2}

Upvotes: 1

Travis Clarke
Travis Clarke

Reputation: 6671

You could try mergedeep.


Installation

$ pip3 install mergedeep

Usage

from mergedeep import merge

a = {"keyA": 1}
b = {"keyB": {"sub1": 10}}
c = {"keyB": {"sub2": 20}}

merge(a, b, c) 

print(a)
# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

For a full list of options, check out the docs!

Upvotes: 65

Dorcioman
Dorcioman

Reputation: 636

from collections import defaultdict
from itertools import chain

class DictHelper:

@staticmethod
def merge_dictionaries(*dictionaries, override=True):
    merged_dict = defaultdict(set)
    all_unique_keys = set(chain(*[list(dictionary.keys()) for dictionary in dictionaries]))  # Build a set using all dict keys
    for key in all_unique_keys:
        keys_value_type = list(set(filter(lambda obj_type: obj_type != type(None), [type(dictionary.get(key, None)) for dictionary in dictionaries])))
        # Establish the object type for each key, return None if key is not present in dict and remove None from final result
        if len(keys_value_type) != 1:
            raise Exception("Different objects type for same key: {keys_value_type}".format(keys_value_type=keys_value_type))

        if keys_value_type[0] == list:
            values = list(chain(*[dictionary.get(key, []) for dictionary in dictionaries]))  # Extract the value for each key
            merged_dict[key].update(values)

        elif keys_value_type[0] == dict:
            # Extract all dictionaries by key and enter in recursion
            dicts_to_merge = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = DictHelper.merge_dictionaries(*dicts_to_merge)

        else:
            # if override => get value from last dictionary else make a list of all values
            values = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = values[-1] if override else values

    return dict(merged_dict)



if __name__ == '__main__':
  d1 = {'aaaaaaaaa': ['to short', 'to long'], 'bbbbb': ['to short', 'to long'], "cccccc": ["the is a test"]}
  d2 = {'aaaaaaaaa': ['field is not a bool'], 'bbbbb': ['field is not a bool']}
  d3 = {'aaaaaaaaa': ['filed is not a string', "to short"], 'bbbbb': ['field is not an integer']}
  print(DictHelper.merge_dictionaries(d1, d2, d3))

  d4 = {"a": {"x": 1, "y": 2, "z": 3, "d": {"x1": 10}}}
  d5 = {"a": {"x": 10, "y": 20, "d": {"x2": 20}}}
  print(DictHelper.merge_dictionaries(d4, d5))

Output:

{'bbbbb': {'to long', 'field is not an integer', 'to short', 'field is not a bool'}, 
'aaaaaaaaa': {'to long', 'to short', 'filed is not a string', 'field is not a bool'}, 
'cccccc': {'the is a test'}}

{'a': {'y': 20, 'd': {'x1': 10, 'x2': 20}, 'z': 3, 'x': 10}}

Upvotes: 0

Vikas Kumar
Vikas Kumar

Reputation: 91

Based on answers from @andrew cooke. It takes care of nested lists in a better way.

def deep_merge_lists(original, incoming):
    """
    Deep merge two lists. Modifies original.
    Recursively call deep merge on each correlated element of list. 
    If item type in both elements are
     a. dict: Call deep_merge_dicts on both values.
     b. list: Recursively call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    If length of incoming list is more that of original then extra values are appended.
    """
    common_length = min(len(original), len(incoming))
    for idx in range(common_length):
        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
            deep_merge_dicts(original[idx], incoming[idx])

        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
            deep_merge_lists(original[idx], incoming[idx])

        else:
            original[idx] = incoming[idx]

    for idx in range(common_length, len(incoming)):
        original.append(incoming[idx])


def deep_merge_dicts(original, incoming):
    """
    Deep merge two dictionaries. Modifies original.
    For key conflicts if both values are:
     a. dict: Recursively call deep_merge_dicts on both values.
     b. list: Call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    """
    for key in incoming:
        if key in original:
            if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                deep_merge_dicts(original[key], incoming[key])

            elif isinstance(original[key], list) and isinstance(incoming[key], list):
                deep_merge_lists(original[key], incoming[key])

            else:
                original[key] = incoming[key]
        else:
            original[key] = incoming[key]

Upvotes: 8

David Schneider
David Schneider

Reputation: 307

In case someone wants yet another approach to this problem, here's my solution.

Virtues: short, declarative, and functional in style (recursive, does no mutation).

Potential Drawback: This might not be the merge you're looking for. Consult the docstring for semantics.

def deep_merge(a, b):
    """
    Merge two values, with `b` taking precedence over `a`.

    Semantics:
    - If either `a` or `b` is not a dictionary, `a` will be returned only if
      `b` is `None`. Otherwise `b` will be returned.
    - If both values are dictionaries, they are merged as follows:
        * Each key that is found only in `a` or only in `b` will be included in
          the output collection with its value intact.
        * For any key in common between `a` and `b`, the corresponding values
          will be merged with the same semantics.
    """
    if not isinstance(a, dict) or not isinstance(b, dict):
        return a if b is None else b
    else:
        # If we're here, both a and b must be dictionaries or subtypes thereof.

        # Compute set of all keys in both dictionaries.
        keys = set(a.keys()) | set(b.keys())

        # Build output dictionary, merging recursively values with common keys,
        # where `None` is used to mean the absence of a value.
        return {
            key: deep_merge(a.get(key), b.get(key))
            for key in keys
        }

Upvotes: 9

SlackSpace
SlackSpace

Reputation: 1

hey there I also had the same problem but I though of a solution and I will post it here, in case it is also useful for others, basically merging nested dictionaries and also adding the values, for me I needed to calculate some probabilities so this one worked great:

#used to copy a nested dict to a nested dict
def deepupdate(target, src):
    for k, v in src.items():
        if k in target:
            for k2, v2 in src[k].items():
                if k2 in target[k]:
                    target[k][k2]+=v2
                else:
                    target[k][k2] = v2
        else:
            target[k] = copy.deepcopy(v)

by using the above method we can merge:

target = {'6,6': {'6,63': 1}, '63,4': {'4,4': 1}, '4,4': {'4,3': 1}, '6,63': {'63,4': 1}}

src = {'5,4': {'4,4': 1}, '5,5': {'5,4': 1}, '4,4': {'4,3': 1}}

and this will become: {'5,5': {'5,4': 1}, '5,4': {'4,4': 1}, '6,6': {'6,63': 1}, '63,4': {'4,4': 1}, '4,4': {'4,3': 2}, '6,63': {'63,4': 1}}

also notice the changes here:

target = {'6,6': {'6,63': 1}, '6,63': {'63,4': 1}, '4,4': {'4,3': 1}, '63,4': {'4,4': 1}}

src = {'5,4': {'4,4': 1}, '4,3': {'3,4': 1}, '4,4': {'4,9': 1}, '3,4': {'4,4': 1}, '5,5': {'5,4': 1}}

merge = {'5,4': {'4,4': 1}, '4,3': {'3,4': 1}, '6,63': {'63,4': 1}, '5,5': {'5,4': 1}, '6,6': {'6,63': 1}, '3,4': {'4,4': 1}, '63,4': {'4,4': 1}, '4,4': {'4,3': 1, '4,9': 1}}

dont forget to also add the import for copy:

import copy

Upvotes: 0

Slava
Slava

Reputation: 1524

I have another slightly different solution here:

def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) :
''' merge d2 into d1. using inconflict function to resolve the leaf conflicts '''
    for k in d2:
        if k in d1 : 
            if isinstance(d1[k], dict) and isinstance(d2[k], dict) :
                deepMerge(d1[k], d2[k], inconflict)
            elif d1[k] != d2[k] :
                d1[k] = inconflict(d1[k], d2[k])
        else :
            d1[k] = d2[k]
    return d1

By default it resolves conflicts in favor of values from the second dict, but you can easily override this, with some witchery you may be able to even throw exceptions out of it. :).

Upvotes: 0

Guy Gangemi
Guy Gangemi

Reputation: 1773

Since dictviews support set operations, I was able to greatly simplify jterrace's answer.

def merge(dict1, dict2):
    for k in dict1.keys() - dict2.keys():
        yield (k, dict1[k])

    for k in dict2.keys() - dict1.keys():
        yield (k, dict2[k])

    for k in dict1.keys() & dict2.keys():
        yield (k, dict(merge(dict1[k], dict2[k])))

Any attempt to combine a dict with a non dict (technically, an object with a 'keys' method and an object without a 'keys' method) will raise an AttributeError. This includes both the initial call to the function and recursive calls. This is exactly what I wanted so I left it. You could easily catch an AttributeErrors thrown by the recursive call and then yield any value you please.

Upvotes: 2

Related Questions