Reputation: 35
I have a Python problem that can be solved with multiple nested for
loops but I was wondering if there is an easier way to solve this, maybe by adding list items together and dropping duplicates.
My list looks like this:
main_list = [["[email protected]", "Administration", "100"],
["[email protected]", "Testing", "30"],
["[email protected]", "Development", "45"],
["[email protected]", "Development", "90"],
["[email protected]", "Development", "35"],
["[email protected]", "Development", "400"],
["[email protected]", "Administration", "95"],
["[email protected]", "Testing", "200"]]
I need to merge the email address and category (the first two list elements) and add the duplicate 3rd entries together.
So [user2, development] goes from:
["[email protected]", "Development", "45"],
["[email protected]", "Development", "90"],
["[email protected]", "Development", "35"],
to:
["[email protected]", "Development", "170"]
It this possible with list manipulation?
Thank you!
Upvotes: 2
Views: 117
Reputation: 1
Exemplified, step by step.
main_dict = {}
for email, category, value in main_list:
token = (email, category)
if token in main_dict:
main_dict[token] += int(value)
else:
main_dict[token] = int(value)
main_list_converted = []
for k, v in main_dict.iteritems():
main_list_converted.append(list(k) + [v])
main_list_converted.sort()
"""
for item in main_list_converted:
print (item)
[['[email protected]', 'Administration', 100]
['[email protected]', 'Development', 170]
['[email protected]', 'Testing', 30]
['[email protected]', 'Administration', 95]
['[email protected]', 'Development', 400]
['[email protected]', 'Testing', 200]]
"""
Upvotes: 0
Reputation: 1126
With pandas module:
import pandas as pd
out_d = (pd.DataFrame(main_list).set_index([0,1])[2].astype(int).groupby(level=[0,1]).sum()).to_dict()
out_d
Out[1]:
{('[email protected]', 'Administration'): 100,
('[email protected]', 'Development'): 170,
('[email protected]', 'Testing'): 30,
('[email protected]', 'Administration'): 95,
('[email protected]', 'Development'): 400,
('[email protected]', 'Testing'): 200}
#for list
[[u[0],u[1],v] for u,v in out_d.items()]
Out[2]:
[['[email protected]', 'Administration', 100],
['[email protected]', 'Development', 170],
['[email protected]', 'Testing', 30],
['[email protected]', 'Administration', 95],
['[email protected]', 'Development', 400],
['[email protected]', 'Testing', 200]]
Upvotes: 0
Reputation: 17911
You can use the function groupby()
:
from itertools import groupby
from operator import itemgetter
iget = itemgetter(0, 1)
[[*k, sum(int(i[2]) for i in g)] for k, g in groupby(sorted(main_list), key=iget)]
Result:
[['[email protected]', 'Administration', 100],
['[email protected]', 'Development', 170],
['[email protected]', 'Testing', 30],
['[email protected]', 'Administration', 95],
['[email protected]', 'Development', 400],
['[email protected]', 'Testing', 200]]
Upvotes: 1
Reputation: 82815
Using collections.defaultdict
Ex:
from collections import defaultdict
main_list = [["[email protected]", "Administration", "100"],
["[email protected]", "Testing", "30"],
["[email protected]", "Development", "45"],
["[email protected]", "Development", "90"],
["[email protected]", "Development", "35"],
["[email protected]", "Development", "400"],
["[email protected]", "Administration", "95"],
["[email protected]", "Testing", "200"]]
result = defaultdict(int)
for k, v, n in main_list:
result[(k, v)] += int(n)
result = [list(k) + [v] for k, v in result.items()]
print(result)
Output:
[['[email protected]', 'Administration', 100],
['[email protected]', 'Testing', 30],
['[email protected]', 'Development', 170],
['[email protected]', 'Development', 400],
['[email protected]', 'Administration', 95],
['[email protected]', 'Testing', 200]]
Upvotes: 4