Reputation: 11
I have lists of ids and scores:
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
I want to remove duplicates from list ids so that scores would sum up accordingly.This is something very similar to what groupby.sum() does when use dataframes.
So, as output I expect :
ids=[1,2,3]
scores=[60,20,40]
I use the following code but it doesn't work well for all cases:
for indi ,i in enumerate(ids):
for indj ,j in enumerate(ids):
if(i==j) and (indi!=indj):
del ids[i]
scores[indj]=scores[indi]+scores[indj]
del scores[indi]
Upvotes: 0
Views: 53
Reputation: 36838
With only built-in Python tools I would do that task following way:
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
uids = list(set(ids)) # unique ids
for uid in uids:
print(uid,sum(s for inx,s in enumerate(scores) if ids[inx]==uid))
Output:
1 60
2 20
3 40
Above code just print
result, but it might be easily changed to result in dict
:
output_dict = {uid:sum(s for inx,s in enumerate(scores) if ids[inx]==uid) for uid in uids} # {1: 60, 2: 20, 3: 40}
or other data structure. Keep in mind that this method require separate pass for every unique id, so it might be slower than other approaches. Whatever this will be or not issue, depends on how big is your data.
Upvotes: 0
Reputation: 577
This may help you.
# Solution 1
import pandas as pd
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
df = pd.DataFrame(list(zip(ids, scores)),
columns=['ids', 'scores'])
print(df.groupby('ids').sum())
#### Output ####
scores
ids
1 60
2 20
3 40
# Solution 2
from itertools import groupby
zipped_list = list(zip(ids, scores))
print([[k, sum(v for _, v in g)] for k, g in groupby(sorted(zipped_list), key = lambda x: x[0])])
#### Output ####
[[1, 60], [2, 20], [3, 40]]
Upvotes: 0
Reputation: 1227
# Find all unique ids and keep track of their scores
id_to_score = {id : 0 for id in set(ids)}
# Sum up the scores for that id
for index, id in enumerate(ids):
id_to_score[id] += scores[index]
unique_ids = []
score_sum = []
for (i, s) in id_to_score.items():
unique_ids.append(i)
score_sum.append(s)
print(unique_ids) # [1, 2, 3]
print(score_sum) # [60, 20, 40]
Upvotes: 0
Reputation: 2022
Simply loop through them and add if the ids match.
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
ans={}
for i,s in zip(ids,scores):
if i in ans:
ans[i]+=s
else:
ans[i]=s
ids, scores=list(ans.keys()), list(ans.values())
Output:
[1, 2, 3]
[60, 20, 40]
Upvotes: 0
Reputation: 5500
As suggested in comments, using a dictionary is one way. You can iterate one time over the list and update the sum per id.
If you want two lists at the end, select the keys
and values
with keys()
and values()
methods from the dictionary:
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
# Init the idct with all ids at 0
dict_ = {i:0 for i in set(ids)}
for id, scores in zip(ids, scores):
dict_[id] += scores
print(dict_)
# {1: 60, 2: 20, 3: 40}
new_ids = list(dict_.keys())
sum_score = list(dict_.values())
print(new_ids)
# [1, 2, 3]
print(sum_score)
# [60, 20, 40]
Upvotes: 0
Reputation: 20500
You can create a dictionary using ids
and scores
with the key as elements of id
and values as the list of elements corresponding to an element in id
, you can them sum up the values, and get your new id
and scores
list
from collections import defaultdict
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
dct = defaultdict(list)
#Create the dictionary of element of ids vs list of elements of scores
for id, score in zip(ids, scores):
dct[id].append(score)
print(dct)
#defaultdict(<class 'list'>, {1: [10, 10, 30, 10], 2: [20], 3: [40]})
#Calculate the sum of values, and get the new ids and scores list
new_ids, new_scores = zip(*((key, sum(value)) for key, value in dct.items()))
print(list(new_ids))
print(list(new_scores))
The output will be
[1, 2, 3]
[60, 20, 40]
Upvotes: 1