Reputation: 8903
I am using Python 2.7 and am trying to de-duplicate a list of lists and merge the values of the duplicates.
Right now I have:
original_list = [['a', 1], ['b', 1], ['a', 1], ['b', 1], ['b', 2], ['c', 2], ['b', 3]]
I want to match on the first element of each nested list and then add the values of the second element. I want to end up with this (the order of the final list does not matter):
ideal_output = [['a', 2], ['b', 7], ['c', 2]]
So far I have some code that will find me the duplicate values based on the first element of each nested list:
for item in original_list:
matches = -1
for x in original_list:
if (item[0] == x[0]):
matches += 1
if matches >= 1:
if item[0] not in duplicates_list:
duplicates_list.append(item[0])
From here I need to search for all duplicates_list items that are in original_list and add up the values, but I am not sure what the best way to do that is.
Upvotes: 29
Views: 30985
Reputation: 95375
Lots of good answers, but they all use rather more code than I would for this, so here's my take, for what it's worth:
totals = {}
for k,v in original_list:
totals[k] = totals.get(k,0) + v
# totals = {'a': 2, 'c': 2, 'b': 7}
Once you have a dict like that, from any of these answers, you can use items
to get a(n object that acts like a) list of tuples:
totals.items()
# => dict_items([('a', 2), ('c', 2), ('b', 7)])
And run list
across the tuples to get a list of lists:
[list(t) for t in totals.items()]
# => [['a', 2], ['c', 2], ['b', 7]]
And sort if you want them in order:
sorted([list(t) for t in totals.items()])
# => [['a', 2], ['b', 7], ['c', 2]]
Upvotes: 33
Reputation: 48357
Use collections.Counter
:
from collections import Counter
original_list = [['a', 1], ['b', 1], ['a', 1], ['b', 1], ['b', 2], ['c', 2], ['b', 3]]
result = Counter()
for k, v in original_list:
result.update({k:v})
map(list, result.items())
# [['a', 2], ['c', 2], ['b', 7]]
So, lot of answers, views and upvotes. I even earned my first Nice answer
out of nothing (in last 2 days I made lot of answers worth of more research and efforts). In view of this, I decided to do at least some research and test solutions performance with a simple script written from scratch. Do not include code directly in answer for the sake of size.
Each function is named for it's author an easily can be found in question. thefourtheye
's solution now equals one of Mark Reed and is evaluated in original form, thefourtheye2 states for itertools.groupby
based solution.
Each was tested several times (samples), each sample in turn invoked several function iterations. I evaluated min, max and standard deviation for samples times.
Here we go, running probing test for 10 times.
testing: thefourtheye, kroolik2, void, kroolik, alko, reed, visser
10 samples
10 iterations each
author min avg max stddev
reed 0.00000 0.00000 0.00000 0.00000
visser 0.00000 0.00150 0.01500 0.00450
thefourtheye 0.00000 0.00160 0.01600 0.00480
thefourtheye2 0.00000 0.00310 0.01600 0.00620
alko 0.00000 0.00630 0.01600 0.00772
void 0.01500 0.01540 0.01600 0.00049
kroolik2 0.04700 0.06430 0.07800 0.00831
kroolik 0.32800 0.34380 0.37500 0.01716
Look at bottom two rows: at this point kroolik solutions were disqualified since with it any reasonable amount of samples*iterations will be performed for hours. Here goes final tests. I manually added number of upvotes to ouptut:
testing: thefourtheye, kroolik2, void, kroolik, alko, reed, visser
100 samples
1000 iterations each
author upvotes min avg max stddev
reed [20] 0.06200 0.08174 0.15600 0.01841
thefourtheye [5] 0.06200 0.09971 0.20300 0.01911
visser [6] 0.10900 0.12392 0.23500 0.02263
thefourtheye2 0.25000 0.29674 0.89000 0.07183
alko [11] 0.56200 0.62309 1.04700 0.08438
void [3] 1.50000 1.65480 2.39100 0.18721
kroolik [14] [DSQ]
Upvotes: 13
Reputation: 179
May be you can try this too,
>>> x = [[1,1],[2,2],[1,1],[2,2],[3,3],[4,4],[4,4]]
>>> z = []
>>> for i in x:
>>> if i not in z:
>>> z.append(i)
>>>
>>> z
[[1, 1], [2, 2], [3, 3], [4, 4]]
Upvotes: 4
Reputation:
I know it's ugly but I was having some fun trying to implement it in a 1 liner:
map(list, set(([(x[0], sum([i[1] for i in original_list if i[0]==x[0]])) for x in original_list])))
output:
[['a', 2], ['b', 7], ['c', 2]]
Upvotes: 5
Reputation: 239653
If the order doesnt matter, you can use this
original_list = [['a', 1], ['b', 1], ['a', 1], ['b', 1], ['b', 2], ['c', 2], ['b', 3]]
myDict = {}
for first, second in original_list:
myDict[first] = myDict.get(first, 0) + second
result = [[key, value] for key, value in myDict.items()]
print result
Or you can use groupby and the code becomes a oneliner
original_list = [['a', 1], ['b', 1], ['a', 1], ['b', 1], ['b', 2], ['c', 2], ['b', 3]]
from itertools import groupby
print [[key, sum(item[1] for item in list(group))]
for key, group in groupby(sorted(original_list), lambda x:x[0])]
Output
[['a', 2], ['b', 7], ['c', 2]]
Upvotes: 10
Reputation: 15864
>>> from collections import Counter
>>> lst = [['a', 1], ['b', 1], ['a', 1], ['b', 1], ['b', 2], ['c', 2], ['b', 3]]
>>> c = Counter(x for x, c in lst for _ in xrange(c))
Counter({'b': 7, 'a': 2, 'c': 2})
>>> map(list, c.iteritems())
[['a', 2], ['c', 2], ['b', 7]]
Or alternatively, without repeating each item (a, b)
b times (@hcwhsa):
>>> from collections import Counter
>>> lst = [['a', 1], ['b', 1], ['a', 1], ['b', 1], ['b', 2], ['c', 2], ['b', 3]]
>>> c = sum((Counter(**{k:v}) for k, v in lst), Counter())
Counter({'b': 7, 'a': 2, 'c': 2})
>>> map(list, c.iteritems())
[['a', 2], ['c', 2], ['b', 7]]
Upvotes: 15
Reputation: 122506
You can use collections.defaultdict
:
original_list = [['a', 1], ['b', 1], ['a', 1], ['b', 1], ['b', 2], ['c', 2], ['b', 3]]
import collections
data = collections.defaultdict(list)
for item in original_list:
data[item[0]].append(item[1])
output = {key: sum(values) for key, values in data.items()}
print output
# gives: {'a': 2, 'c': 2, 'b': 7}
Upvotes: 5