AlexCh
AlexCh

Reputation: 11

Operation similar to group by for lists

I have lists of ids and scores:

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

I want to remove duplicates from list ids so that scores would sum up accordingly.This is something very similar to what groupby.sum() does when use dataframes.

So, as output I expect :

ids=[1,2,3]
scores=[60,20,40]

I use the following code but it doesn't work well for all cases:

for indi ,i in enumerate(ids):
     for indj ,j in enumerate(ids):
           if(i==j) and (indi!=indj):
                  del ids[i]
                  scores[indj]=scores[indi]+scores[indj]
                  del scores[indi]

Upvotes: 0

Views: 53

Answers (6)

Daweo
Daweo

Reputation: 36838

With only built-in Python tools I would do that task following way:

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
uids = list(set(ids)) # unique ids
for uid in uids:
    print(uid,sum(s for inx,s in enumerate(scores) if ids[inx]==uid))

Output:

1 60
2 20
3 40

Above code just print result, but it might be easily changed to result in dict:

output_dict = {uid:sum(s for inx,s in enumerate(scores) if ids[inx]==uid) for uid in uids} # {1: 60, 2: 20, 3: 40}

or other data structure. Keep in mind that this method require separate pass for every unique id, so it might be slower than other approaches. Whatever this will be or not issue, depends on how big is your data.

Upvotes: 0

Preetham
Preetham

Reputation: 577

This may help you.

#  Solution 1
import pandas as pd

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

df = pd.DataFrame(list(zip(ids, scores)),
                  columns=['ids', 'scores'])


print(df.groupby('ids').sum())

#### Output  ####

     scores
ids        
1        60
2        20
3        40


#  Solution 2
from itertools import groupby
zipped_list  = list(zip(ids, scores))
print([[k, sum(v for _, v in g)] for k, g in groupby(sorted(zipped_list), key = lambda x: x[0])])

#### Output  ####

[[1, 60], [2, 20], [3, 40]]

Upvotes: 0

fizzybear
fizzybear

Reputation: 1227

# Find all unique ids and keep track of their scores
id_to_score = {id : 0 for id in set(ids)}

# Sum up the scores for that id
for index, id in enumerate(ids):
    id_to_score[id] += scores[index]

unique_ids = []
score_sum = []
for (i, s) in id_to_score.items():
    unique_ids.append(i)
    score_sum.append(s)

print(unique_ids) # [1, 2, 3]
print(score_sum)  # [60, 20, 40]

Upvotes: 0

Ricky Kim
Ricky Kim

Reputation: 2022

Simply loop through them and add if the ids match.

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
ans={}
for i,s in zip(ids,scores):
    if i in ans:
        ans[i]+=s
    else:
        ans[i]=s
ids, scores=list(ans.keys()), list(ans.values())

Output:

[1, 2, 3]
[60, 20, 40]

Upvotes: 0

Alexandre B.
Alexandre B.

Reputation: 5500

As suggested in comments, using a dictionary is one way. You can iterate one time over the list and update the sum per id.

If you want two lists at the end, select the keys and values with keys() and values() methods from the dictionary:

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

# Init the idct with all ids at 0
dict_ = {i:0 for i in set(ids)}
for id, scores in zip(ids, scores):
    dict_[id] += scores

print(dict_)
# {1: 60, 2: 20, 3: 40}

new_ids = list(dict_.keys())
sum_score = list(dict_.values())
print(new_ids)
# [1, 2, 3]
print(sum_score)
# [60, 20, 40]

Upvotes: 0

Devesh Kumar Singh
Devesh Kumar Singh

Reputation: 20500

You can create a dictionary using ids and scores with the key as elements of id and values as the list of elements corresponding to an element in id, you can them sum up the values, and get your new id and scores list

from collections import defaultdict

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

dct = defaultdict(list)

#Create the dictionary of element of ids vs list of elements of scores
for id, score in zip(ids, scores):
    dct[id].append(score)

print(dct)
#defaultdict(<class 'list'>, {1: [10, 10, 30, 10], 2: [20], 3: [40]})

#Calculate the sum of values, and get the new ids and scores list
new_ids, new_scores = zip(*((key, sum(value)) for key, value in dct.items()))

print(list(new_ids))
print(list(new_scores))

The output will be

[1, 2, 3]
[60, 20, 40]

Upvotes: 1

Related Questions