Reputation: 1426
Ok, so I am trying to get my head around what I feel should be an easy task. I am using Python3.4.
I have the following list that contains sublists (simplified version):
newlist = [ ['John', 12],['Mary', 10],['Paul', 12],['Mary', 5],['Paul', 8],['John', 7] ]
I am trying to get the sum of all the values that correspond to each unique name. So, in regard to the above-stated list the results should read something like:
John - 19
Mary - 15
Paul - 20
What would be the fastest and/or most efficient way of achieving this?
Example of my own efforts
Right now I have solved my problem like so (but as said: I am looking for a more efficient solution):
unique_names = []
for i in newlist:
if i[0] not in unique_names:
unique_names.append(i[0])
valuelist = []
for name in unique_names:
valuelist.append(name)
yet_another_list = []
for i in newlist:
if name in i:
yet_another_list.append(i[1])
valuelist.append(sum(yet_another_list))
EDIT
- I tested the answers -
Ok, so I got a lot of responses, thnx! For the record I tested them by creating separate functions for each of the proposed solutions. I used start = time.perf_counter()
and end = time.perf_counter() - start
to calculate the response time of each function. I encapsulated the imports within each function that required them.
The list I used for this test contained 3985 items / sublists.
The results (rounded to 4 decimals) from 5 different runs where:
my_own_solution: 0.9800 / 0.9703 / 0.9873 / 1.0023 / 0.9540
defaultdict try: 0.0014 / 0.0016 / 0.0014 / 0.0018 / 0.0014
counter try: 0.0030 / 0.0026 / 0.0026 / 0.0027 / 0.0026
reduce_try: 0.0155 / 0.0153 / 0.0151 / 0.0149 / 0.0174
ittertry: 0.0242 / 0.0268 / 0.0239 / 0.0307 / 0.0259 (failed on floats)
valuelisttry: 0.0018 / 0.0018 / 0.0019 / 0.0020 / 0.0043
Overall, I really appreciate the simplicity of the defaultdict statement. This also seems to be the fastest option overall. However, for those in dislike of imports the valuelist (or actually value dictionary) option seems like a fine choice as well.
Upvotes: 2
Views: 102
Reputation: 160437
The fastest approach would probably involve a Counter
from collections
and chain
and repeat
from itertools
:
from_it = chain.from_iterable
c = Counter(from_it(repeat(i, j) for i,j in chain(newlist)))
Which yields:
Counter({'John': 19, 'Mary': 15, 'Paul': 20})
The statement unpacks every list from newlist
with for i,j in chain(newlist)
and then feeds the string i
(e.g John
) along with its count j
to repeat
in order for it to be repeated that number of times. This comprehension is then passed to chain.from_iterable
(from_it
) so it can be supplied as input to Counter
.
Upvotes: 0
Reputation: 78556
You can use a collections.Counter
object:
from collections import Counter
c = Counter()
for name, cnt in newlist:
c[name] += cnt
print(c.items())
# [('Paul', 20), ('John', 19), ('Mary', 15)]
If you're into one liners (although not necessarily more efficient or readable) you can use functools.reduce
and pass a Counter
as the initializer:
from functools import reduce
c = reduce(lambda x, y: x.update({y[0]: y[1]}) or x, newlist, Counter())
Upvotes: 1
Reputation: 8078
use defaultdict
from collections import defaultdict
values = defaultdict(int)
for x, y in newlist:
values[x]+=y
edit: just use defaultdict(int), int already is a callable i didnt think of that!
Upvotes: 2
Reputation: 78690
I'd use a defaultdict.
>>> from collections import defaultdict
>>> newlist = [ ['John', 12],['Mary', 10],['Paul', 12],['Mary', 5],['Paul', 8],['John', 7] ]
>>> d = defaultdict(int)
>>> for name, score in newlist:
... d[name] += score
...
>>> d
defaultdict(<class 'int'>, {'Mary': 15, 'John': 19, 'Paul': 20})
Upvotes: 1
Reputation: 14096
valuelist = {}
for (name, value) in newlist:
if name not in valuelist:
valuelist[name] = 0
valuelist[name] += value
print (valuelist)
{'Paul': 20, 'John': 19, 'Mary': 15}
Upvotes: 0