Scott Colby
Scott Colby

Reputation: 1430

Merge Dictionaries Preserving Old Keys and New Values

I'm writing a Python script that parses RSS feeds. I want to maintain a dictionary of entries from the feed that gets updated periodically. Entries that no longer exist in the feed should be removed, new entries should get a default value, and the values for previously seen entries should remain unchanged.

This is best explained by example, I think:

>>> old = {
...     'a': 1,
...     'b': 2,
...     'c': 3
... }
>>> new = {
...     'c': 'x',
...     'd': 'y',
...     'e': 'z'
... }
>>> out = some_function(old, new)
>>> out
{'c': 3, 'd': 'y', 'e': 'z'}

Here's my current attempt at this:

def merge_preserving_old_values_and_new_keys(old, new):
       out = {}
       for k, v in new.items():
           out[k] = v
       for k, v in old.items():
           if k in out:
               out[k] = v
       return out

This works, but it seems to me there might be a better or more clever way.

EDIT: If you feel like testing your function:

def my_merge(old, new):
    pass

old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}

out = my_merge(old, new)
assert out == {'c': 3, 'd': 'y', 'e': 'z'}

EDIT 2: Defining Martijn Pieters' answer as set_merge, bravosierra99's as loop_merge, and my first attempt as orig_merge, I get the following timing results:

>>> setup="""
... old = {'a': 1, 'b': 2, 'c': 3}
... new = {'c': 'x', 'd': 'y', 'e': 'z'}
... from __main__ import set_merge, loop_merge, orig_merge
... """
>>> timeit.timeit('set_merge(old, new)', setup=setup)
3.4415210600000137
>>> timeit.timeit('loop_merge(old, new)', setup=setup)
1.161155690000669
>>> timeit.timeit('orig_merge(old, new)', setup=setup)
1.1776735319999716

I find this surprising, since I didn't expect the dictionary view approach to be that much slower.

Upvotes: 1

Views: 245

Answers (4)

Scott Colby
Scott Colby

Reputation: 1430

I'm not 100% the best way to add this information to the discussion: feel free to edit/redistribute it if necessary.

Here are timing results for all of the methods discussed here.

from timeit import timeit

def loop_merge(old, new):
    out = {}
    for k, v in new.items():
        if k in old:
            out[k] = old[k]
        else:
                out[k] = v
    return out

def set_merge(old, new):
    out = new.copy()
    out.update((k, old[k]) for k in old.keys() & new.keys())
    return out

def comp_merge(old, new):
    return {k: old[k] if k in old else v for k, v in new.items()}

def orig_merge(old, new):
    out = {}
    for k, v in new.items():
        out[k] = v
    for k, v in old.items():
        if k in out:
            out[k] = v
    return out


old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = {'c': 3, 'd': 'y', 'e': 'z'}

assert loop_merge(old, new) == out
assert set_merge(old, new) == out
assert comp_merge(old, new) == out
assert orig_merge(old, new) == out

setup = """
from __main__ import old, new, loop_merge, set_merge, comp_merge, orig_merge
"""

for a in ['loop', 'set', 'comp', 'orig']:
    time = timeit('{}_merge(old, new)'.format(a), setup=setup)
    print('{}: {}'.format(a, time))

size = 10**4
large_old = {i: 'old' for i in range(size)}
large_new = {i: 'new' for i in range(size//2, size)}

setup = """
from __main__ import large_old, large_new, loop_merge, set_merge, comp_merge, orig_merge
"""

for a in ['loop', 'set', 'comp', 'orig']:
    time = timeit('{}_merge(large_old, large_new)'.format(a), setup=setup)
    print('{}: {}'.format(a, time))

The winner is the improved looping method!

$ python3 merge.py
loop: 0.7791572390015062  # small dictionaries
set: 3.1920828100010112
comp: 1.1180207730030816
orig: 1.1681104259987478
loop: 927.2149353210007  # large dictionaries
set: 1696.8342713210004
comp: 902.039078668
orig: 1373.0389542560006

I'm disappointed, because the dictionary view/set operation method is much cooler.

With larger dictionaries (10^4 items), the dictionary comprehension method pulls ahead of the improved looping method and far ahead of the original method. The set operation method still performs the slowest.

Upvotes: 0

be_good_do_good
be_good_do_good

Reputation: 4441

old = {
    'a': 1,
    'b': 2,
    'c': 3
}
new = {
    'c': 'x',
    'd': 'y',
    'e': 'z'
}

def merge_preserving_old_values_and_new_keys(o, n):
    out = {}
    for k in n:
        if k in o:
            out[k] = o[k]
        else:
            out[k] = n[k]
    return out

print merge_preserving_old_values_and_new_keys(old, new)

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1123850

Dictionaries have dictionary view objects that act as sets. Use these to get the intersection between old and new:

def merge_preserving_old_values_and_new_keys(old, new):
    result = new.copy()
    result.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
    return result

The above uses the Python 2 syntax; use old.keys() & new.keys() if you are using Python 3, for the same results:

def merge_preserving_old_values_and_new_keys(old, new):
    # Python 3 version
    result = new.copy()
    result.update((k, old[k]) for k in old.keys() & new.keys())
    return result

The above takes all key-value pairs from new as a starting point, then adds the values for old for any key that appears in both.

Demo:

>>> merge_preserving_old_values_and_new_keys(old, new)
{'c': 3, 'e': 'z', 'd': 'y'}

Note that the function, like your version, produces a new dictionary (albeit that the key and value objects are shared; it is a shallow copy).

You could also just update the new dictionary in-place if you don't need that new dictionary for anything else:

def merge_preserving_old_values_and_new_keys(old, new):
    new.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
    return new

You could also use a one-liner dict comprehension to build a new dictionary:

def merge_preserving_old_values_and_new_keys(old, new):
    return {k: old[k] if k in old else v for k, v in new.items()}

Upvotes: 5

bravosierra99
bravosierra99

Reputation: 1371

This should be more efficient, since you are no longer iterating through the entire old.items(). Additionally, it's more clear what you are trying to do this way since you aren't overwriting some values.

for k, v in new.items():
    if k in old.keys():
      out[k] = old[k]
    else:
      out[k] = v
return out

Upvotes: 2

Related Questions