Reputation: 1430
I'm writing a Python script that parses RSS feeds. I want to maintain a dictionary of entries from the feed that gets updated periodically. Entries that no longer exist in the feed should be removed, new entries should get a default value, and the values for previously seen entries should remain unchanged.
This is best explained by example, I think:
>>> old = {
... 'a': 1,
... 'b': 2,
... 'c': 3
... }
>>> new = {
... 'c': 'x',
... 'd': 'y',
... 'e': 'z'
... }
>>> out = some_function(old, new)
>>> out
{'c': 3, 'd': 'y', 'e': 'z'}
Here's my current attempt at this:
def merge_preserving_old_values_and_new_keys(old, new):
out = {}
for k, v in new.items():
out[k] = v
for k, v in old.items():
if k in out:
out[k] = v
return out
This works, but it seems to me there might be a better or more clever way.
EDIT: If you feel like testing your function:
def my_merge(old, new):
pass
old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = my_merge(old, new)
assert out == {'c': 3, 'd': 'y', 'e': 'z'}
EDIT 2:
Defining Martijn Pieters' answer as set_merge
, bravosierra99's as loop_merge
, and my first attempt as orig_merge
, I get the following timing results:
>>> setup="""
... old = {'a': 1, 'b': 2, 'c': 3}
... new = {'c': 'x', 'd': 'y', 'e': 'z'}
... from __main__ import set_merge, loop_merge, orig_merge
... """
>>> timeit.timeit('set_merge(old, new)', setup=setup)
3.4415210600000137
>>> timeit.timeit('loop_merge(old, new)', setup=setup)
1.161155690000669
>>> timeit.timeit('orig_merge(old, new)', setup=setup)
1.1776735319999716
I find this surprising, since I didn't expect the dictionary view approach to be that much slower.
Upvotes: 1
Views: 245
Reputation: 1430
I'm not 100% the best way to add this information to the discussion: feel free to edit/redistribute it if necessary.
Here are timing results for all of the methods discussed here.
from timeit import timeit
def loop_merge(old, new):
out = {}
for k, v in new.items():
if k in old:
out[k] = old[k]
else:
out[k] = v
return out
def set_merge(old, new):
out = new.copy()
out.update((k, old[k]) for k in old.keys() & new.keys())
return out
def comp_merge(old, new):
return {k: old[k] if k in old else v for k, v in new.items()}
def orig_merge(old, new):
out = {}
for k, v in new.items():
out[k] = v
for k, v in old.items():
if k in out:
out[k] = v
return out
old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = {'c': 3, 'd': 'y', 'e': 'z'}
assert loop_merge(old, new) == out
assert set_merge(old, new) == out
assert comp_merge(old, new) == out
assert orig_merge(old, new) == out
setup = """
from __main__ import old, new, loop_merge, set_merge, comp_merge, orig_merge
"""
for a in ['loop', 'set', 'comp', 'orig']:
time = timeit('{}_merge(old, new)'.format(a), setup=setup)
print('{}: {}'.format(a, time))
size = 10**4
large_old = {i: 'old' for i in range(size)}
large_new = {i: 'new' for i in range(size//2, size)}
setup = """
from __main__ import large_old, large_new, loop_merge, set_merge, comp_merge, orig_merge
"""
for a in ['loop', 'set', 'comp', 'orig']:
time = timeit('{}_merge(large_old, large_new)'.format(a), setup=setup)
print('{}: {}'.format(a, time))
The winner is the improved looping method!
$ python3 merge.py
loop: 0.7791572390015062 # small dictionaries
set: 3.1920828100010112
comp: 1.1180207730030816
orig: 1.1681104259987478
loop: 927.2149353210007 # large dictionaries
set: 1696.8342713210004
comp: 902.039078668
orig: 1373.0389542560006
I'm disappointed, because the dictionary view/set operation method is much cooler.
With larger dictionaries (10^4 items), the dictionary comprehension method pulls ahead of the improved looping method and far ahead of the original method. The set operation method still performs the slowest.
Upvotes: 0
Reputation: 4441
old = {
'a': 1,
'b': 2,
'c': 3
}
new = {
'c': 'x',
'd': 'y',
'e': 'z'
}
def merge_preserving_old_values_and_new_keys(o, n):
out = {}
for k in n:
if k in o:
out[k] = o[k]
else:
out[k] = n[k]
return out
print merge_preserving_old_values_and_new_keys(old, new)
Upvotes: 0
Reputation: 1123850
Dictionaries have dictionary view objects that act as sets. Use these to get the intersection between old and new:
def merge_preserving_old_values_and_new_keys(old, new):
result = new.copy()
result.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
return result
The above uses the Python 2 syntax; use old.keys() & new.keys()
if you are using Python 3, for the same results:
def merge_preserving_old_values_and_new_keys(old, new):
# Python 3 version
result = new.copy()
result.update((k, old[k]) for k in old.keys() & new.keys())
return result
The above takes all key-value pairs from new
as a starting point, then adds the values for old
for any key that appears in both.
Demo:
>>> merge_preserving_old_values_and_new_keys(old, new)
{'c': 3, 'e': 'z', 'd': 'y'}
Note that the function, like your version, produces a new dictionary (albeit that the key and value objects are shared; it is a shallow copy).
You could also just update the new dictionary in-place if you don't need that new dictionary for anything else:
def merge_preserving_old_values_and_new_keys(old, new):
new.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
return new
You could also use a one-liner dict comprehension to build a new dictionary:
def merge_preserving_old_values_and_new_keys(old, new):
return {k: old[k] if k in old else v for k, v in new.items()}
Upvotes: 5
Reputation: 1371
This should be more efficient, since you are no longer iterating through the entire old.items(). Additionally, it's more clear what you are trying to do this way since you aren't overwriting some values.
for k, v in new.items():
if k in old.keys():
out[k] = old[k]
else:
out[k] = v
return out
Upvotes: 2