Arne
Arne

Reputation: 309

Merge dictionaries without overwriting values

it seems a simple task:

I am trying to merge 2 dictionaries without overwriting the values but APPENDING.

a = {1: [(1,1)],2: [(2,2),(3,3)],3: [(4,4)]} 
b = {3: [(5,5)], 4: [(6,6)]}

number of tuples a = 4, number of tuples b = 2

This is why I have singled out these options since they are overwriting:

all = dict(a.items() + b.items()) 
all = dict(a, **b)
all = a.update([b])

The following solution works just fine, BUT it also appends values to my original dictionary a:

all = {}

for k in a.keys():
    if k in all:
        all[k].append(a[k])
    else:
        all[k] = a[k]


for k in b.keys():
    if k in all:
        all[k].append(b[k])
    else:
        all[k] = b[k]

Output =

a = {1: [(1, 1)], 2: [(2, 2), (3, 3)], 3: [(4, 4), **[(5, 5)]**]}
b = {3: [(5, 5)], 4: [(6, 6)]}
c = {1: [(1, 1)], 2: [(2, 2), (3, 3)], 3: [(4, 4), [(5, 5)]], 4: [(6, 6)]}

number of tuples a = 5 !!!!!, number of tuples b = 2 (correct), number of tuples all = 6 (correct)

It appended tuple [(5,5)] from b to a. I have no idea as to why this happens because all I am coding is to write everything into the complete dictionary "all".

Can anyone tell me where the heck it is changing dict(a) ???????

Any help is greatly welcome.

Upvotes: 2

Views: 7546

Answers (3)

Matti Lyra
Matti Lyra

Reputation: 13088

If you want a third dictionary that is the combined one I would use the collection.defaultdict

from collections import defaultdict
from itertools import chain
all = defaultdict(list)
for k,v in chain(a.iteritems(), b.iteritems()):
    all[k].extend(v)

outputs

defaultdict(<type 'list'>, {1: [(1, 1)], 2: [(2, 2), (3, 3)], 3: [(4, 4), (5, 5)], 4: [(6, 6)]})

Upvotes: 5

Martijn Pieters
Martijn Pieters

Reputation: 1123860

Use .extend instead of .append for merging lists together.

>>> example = [1, 2, 3]
>>> example.append([4, 5])
>>> example
[1, 2, 3, [4, 5]]
>>> example.extend([6, 7])
>>> example
[1, 2, 3, [4, 5], 6, 7]

Moreover, you can loop over the keys and values of both a and b together using itertools.chain:

from itertools import chain
all = {}
for k, v in chain(a.iteritems(), b.iteritems()):
    all.setdefault(k, []).extend(v)

.setdefault() looks up a key, and sets it to a default if it is not yet there. Alternatively you could use collections.defaultdict to do the same implicitly.

outputs:

>>> a
{1: [(1, 1)], 2: [(2, 2), (3, 3)], 3: [(4, 4)]}
>>> b
{3: [(5,5)], 4: [(6,6)]}
>>> all
{1: [(1, 1)], 2: [(2, 2), (3, 3)], 3: [(4, 4), (5, 5)], 4: [(6, 6)]}

Note that because we now create a clean new list for each key first, then extend, your original lists in a are unaffected. In your code you do not create a copy of the list; instead you copied the reference to the list. In the end both the all and the a dict values point to the same lists, and using append on those lists results in the changes being visible in both places.

It's easy to demonstrate that with simple variables instead of a dict:

>>> foo = [1, 2, 3]
>>> bar = foo
>>> bar
[1, 2, 3]
>>> bar.append(4)
>>> foo, bar
([1, 2, 3, 4], [1, 2, 3, 4])
>>> id(foo), id(bar)
(4477098392, 4477098392)

Both foo and bar refer to the same list, the list was not copied. To create a copy instead, use the list() constructor or use the [:] slice operator:

>>> bar = foo[:]
>>> bar.append(5)
>>> foo, bar
([1, 2, 3, 4], [1, 2, 3, 4, 5])
>>> id(foo), id(bar)
(4477098392, 4477098536)

Now bar is a new copy of the list and changes no longer are visible in foo. The memory addresses (the result of the id() call) differ for the two lists.

Upvotes: 6

Pierre GM
Pierre GM

Reputation: 20339

As an explanation of why your a changes, consider your loop:

for k in a.keys():
    if k in all:
        all[k].append(a[k])
    else:
        all[k] = a[k]

So, if k is not yet in all, you enter the else part and now, all[k] points to the a[k] list. It's not a copy, it's a reference to a[k]: they're basically the same object. At the next iteration, all[k] is defined, and you append to it: but as all[k] points to a[k], you end up also appending to a[k].

You want to avoid a all[k] = a[k]. You could try that:

for k in a.keys():
    if k not in all:
        all[k] = []
    all[k].extend(a[k])

(Note the extend instead of the append, as pointed out by @Martijn Pieters). Here, you never have all[k] pointing to a[k], so you're safe. @Martijn Pieters' answer is far more concise and elegant, though, so you should go with it.

Upvotes: 1

Related Questions