Reputation: 1426

Summing up values contained in sublist under specified condition

Ok, so I am trying to get my head around what I feel should be an easy task. I am using Python3.4.

I have the following list that contains sublists (simplified version):

newlist = [ ['John', 12],['Mary', 10],['Paul', 12],['Mary', 5],['Paul', 8],['John', 7] ]

I am trying to get the sum of all the values that correspond to each unique name. So, in regard to the above-stated list the results should read something like:

John - 19

Mary - 15

Paul - 20

What would be the fastest and/or most efficient way of achieving this?

Example of my own efforts

Right now I have solved my problem like so (but as said: I am looking for a more efficient solution):

unique_names = []
for i in newlist:
    if i[0] not in unique_names:
        unique_names.append(i[0])

valuelist = []
for name in unique_names:
    valuelist.append(name)
    yet_another_list = []
    for i in newlist:
        if name in i:
            yet_another_list.append(i[1])
    valuelist.append(sum(yet_another_list))

EDIT

- I tested the answers -

Ok, so I got a lot of responses, thnx! For the record I tested them by creating separate functions for each of the proposed solutions. I used start = time.perf_counter() and end = time.perf_counter() - start to calculate the response time of each function. I encapsulated the imports within each function that required them.

The list I used for this test contained 3985 items / sublists.

The results (rounded to 4 decimals) from 5 different runs where:

my_own_solution: 0.9800 / 0.9703 / 0.9873 / 1.0023 / 0.9540

defaultdict try: 0.0014 / 0.0016 / 0.0014 / 0.0018 / 0.0014

counter try: 0.0030 / 0.0026 / 0.0026 / 0.0027 / 0.0026

reduce_try: 0.0155 / 0.0153 / 0.0151 / 0.0149 / 0.0174

ittertry: 0.0242 / 0.0268 / 0.0239 / 0.0307 / 0.0259 (failed on floats)

valuelisttry: 0.0018 / 0.0018 / 0.0019 / 0.0020 / 0.0043

Overall, I really appreciate the simplicity of the defaultdict statement. This also seems to be the fastest option overall. However, for those in dislike of imports the valuelist (or actually value dictionary) option seems like a fine choice as well.

Upvotes: 2

Answers (5)

Dimitris Fasarakis Hilliard

Reputation: 160437

The fastest approach would probably involve a Counter from collections and chain and repeat from itertools:

from_it = chain.from_iterable
c = Counter(from_it(repeat(i, j) for i,j in chain(newlist)))

Which yields:

Counter({'John': 19, 'Mary': 15, 'Paul': 20})

The statement unpacks every list from newlist with for i,j in chain(newlist) and then feeds the string i (e.g John) along with its count j to repeat in order for it to be repeated that number of times. This comprehension is then passed to chain.from_iterable (from_it) so it can be supplied as input to Counter.

Upvotes: 0

Moses Koledoye

Reputation: 78556

You can use a collections.Counter object:

from collections import Counter

c =  Counter()
for name, cnt in newlist:
    c[name] += cnt

print(c.items())
# [('Paul', 20), ('John', 19), ('Mary', 15)]

If you're into one liners (although not necessarily more efficient or readable) you can use functools.reduce and pass a Counter as the initializer:

from functools import reduce

c = reduce(lambda x, y: x.update({y[0]: y[1]}) or x, newlist, Counter())

Upvotes: 1

parsethis

Reputation: 8078

use defaultdict

from collections import defaultdict

values = defaultdict(int)

for x, y in newlist:
    values[x]+=y

edit: just use defaultdict(int), int already is a callable i didnt think of that!

Upvotes: 2

timgeb

Reputation: 78690

I'd use a defaultdict.

>>> from collections import defaultdict
>>> newlist = [ ['John', 12],['Mary', 10],['Paul', 12],['Mary', 5],['Paul', 8],['John', 7] ]
>>> d = defaultdict(int)
>>> for name, score in newlist:
...     d[name] += score
... 
>>> d
defaultdict(<class 'int'>, {'Mary': 15, 'John': 19, 'Paul': 20})

Upvotes: 1

Ghilas BELHADJ

Reputation: 14096

valuelist = {}
for (name, value) in newlist:
  if name not in valuelist:
    valuelist[name] = 0
  valuelist[name] += value

print (valuelist)

{'Paul': 20, 'John': 19, 'Mary': 15}

Upvotes: 0

Summing up values contained in sublist under specified condition

Answers (5)

Related Questions