user1403568
user1403568

Reputation: 493

Removing duplicate keys from python dictionary but summing the values

I have a dictionary in python

d = {tags[0]: value, tags[1]: value, tags[2]: value, tags[3]: value, tags[4]: value}

imagine that this dict is 10 times bigger, it has 50 keys and 50 values. Duplicates can be found in this tags but even then values are essential. How can I simply trimm it to recive new dict without duplicates of keys but with summ of values instead?

d = {'cat': 5, 'dog': 9, 'cat': 4, 'parrot': 6, 'cat': 6}

result

d = {'cat': 15, 'dog': 9, 'parrot': 6}

Upvotes: 3

Views: 9772

Answers (8)

Daniel
Daniel

Reputation: 1110

This the perfect situation for using a Counter data structure. Let's take a look at what it does on few familiar data structures:

>>> from collections import Counter
>>> list_a = ["A", "A", "B", "C", "C", "A", "D"]
>>> list_b = ["B", "A", "B", "C", "C", "C", "D"]
>>> c1 = Counter(list_a)
>>> c2 = Counter(list_b)
>>> c1
Counter({'A': 3, 'C': 2, 'B': 1, 'D': 1})
>>> c2
Counter({'C': 3, 'B': 2, 'A': 1, 'D': 1})
>>> c1 - c2
Counter({'A': 2})
>>> c1 + c2
Counter({'C': 5, 'A': 4, 'B': 3, 'D': 2})
>>> c_diff = c1 - c2
>>> c_diff.update([77, 77, -99, 0, 0, 0])
>>> c_diff
Counter({0: 3, 'A': 2, 77: 2, -99: 1})

As you can see this behaves as a set that keeps the count of element occurrences as a value. However, the dictionary in itself is a set-like structure where for values we don't have to have numbers, so the things get more interesting:

>>> dic1 = {"A":"a", "B":"b"}
>>> cd = Counter(dic1)
>>> cd
Counter({'B': 'b', 'A': 'a'})
>>> cd.update(B='bB123')
>>> cd
Counter({'B': 'bbB123', 'A': 'a'})


>>> dic2 = {"A":[1,2], "B": ("a", 5)}
>>> cd2 = Counter(dic2)
>>> cd2
Counter({'B': ('a', 5), 'A': [1, 2]})
>>> cd2.update(A=[42], B=(2,2))
>>> cd2
Counter({'B': ('a', 5, 2, 2), 'A': [1, 2, 42, 42, 42, 42]})
>>> cd2 = Counter(dic2)
>>> cd2
Counter({'B': ('a', 5), 'A': [1, 2]})
>>> cd2.update(A=[42], B=("new elem",))
>>> cd2
Counter({'B': ('a', 5, 'new elem'), 'A': [1, 2, 42]})

As we can see the value we are adding/changing has to be of the same type in update or it throws TypeError.

For the situation we have in the question, we can just go with the flow

>>> d = {'cat': 5, 'dog': 9, 'cat': 4, 'parrot': 6, 'cat': 6}
>>> cd3 = Counter(d)
>>> cd3
Counter({'dog': 9, 'parrot': 6, 'cat': 6})
>>> cd3.update(parrot=123)
>>> cd3
Counter({'parrot': 129, 'dog': 9, 'cat': 6})

Upvotes: 2

user1202466
user1202466

Reputation: 33

If I understand correctly your question that you want to get rid of duplicate key data, use update function of dictionary while creating the dictionary. it will overwrite the data if the key is duplicate.

tps = [('cat',5),('dog',9),('cat',4),('parrot',6),('cat',6)]
result = {}
for k, v in tps:
    result.update({k:v})
for k in result:
    print "%s: %s" % (k, result[k]) 

Output will look like: dog: 9 parrot: 6 cat: 6

Upvotes: 0

Paul Seeb
Paul Seeb

Reputation: 6146

Instead of just doing dict of those things (can't have multiples of same key in a dict) I assume you can have them in a list of tuple pairs. Then it is just as easy as

tps = [('cat',5),('dog',9),('cat',4),('parrot',6),('cat',6)]
result = {}
for k,v in tps:
    try:
        result[k] += v
    except KeyError:
        result[k] = v

>>> result
{'dog': 9, 'parrot': 6, 'cat': 15}

changed mine to more explicit try-except handling. Alfe's is very concise though

Upvotes: 2

Takis
Takis

Reputation: 726

I'm not sure what you're trying to achieve, but the Counter class might be helpful for what you're trying to do: http://docs.python.org/dev/library/collections.html#collections.Counter

Upvotes: 1

Akavall
Akavall

Reputation: 86188

tps = [('cat',5),('dog',9),('cat',4),('parrot',6),('cat',6)]

from collections import defaultdict

dicto = defaultdict(int)

for k,v in tps:
    dicto[k] += v

Result:

>>> dicto
defaultdict(<type 'int'>, {'dog': 9, 'parrot': 6, 'cat': 15})

Upvotes: 6

lukecampbell
lukecampbell

Reputation: 15256

Perhapse what you really want is a tuple of key-value pairs.

[('dog',1), ('cat',2), ('cat',3)]

Upvotes: 1

Alfe
Alfe

Reputation: 59436

I'd like to improve Paul Seeb's answer:

tps = [('cat',5),('dog',9),('cat',4),('parrot',6),('cat',6)]
result = {}
for k, v in tps:
  result[k] = result.get(k, 0) + v

Upvotes: 7

alejo8591
alejo8591

Reputation: 31

This option serves but is done with a list, or best can provide insight

data = []
        for i, j in query.iteritems():
            data.append(int(j))    
        try:
            data.sort()
        except TypeError:
            del data
        data_array = []
        for x in data:
            if x not in data_array:
                data_array.append(x)  
        return data_array

Upvotes: 0

Related Questions