MattE
MattE

Reputation: 159

Groupby and count/sum with list of tuples in python?

I have a list of tuples:

data = [('Team1', 'Mark Owen', 40),
('Team1', 'John Doe', 25), 
('Team2', 'Raj Patel', 40), 
('Team3', 'Matt Le Blanc', 30), 
('Team1', 'Rene Russo', 40), 
('Team1', 'Ronald Regan', 40), 
('Team3', 'Dean Saunders', 15), 
('Team2', 'Michael Antonio', 30)]

I would like to groupby Team (index[0] of each tuple), count the number or persons in each team (index[1]) and sum the numbers related to each Team (index[2]) but I cannot quite figure this out. So far I have tried using defaultdict(list) which returns a dict, for example I've tried this to group by Team:

def create_hrs_totals():  
    result = defaultdict(list)
    for k, *v in data():
        result[k] += v
    return dict(result)

but then I am struggling with working with that output to achieve what I need using a list comp or whatever... The result I am looking for is a new list:

[Team1, 4, 145, 
Team2, 2, 80, 
Team3, 2, 70]

Is there a better way of doing this?

Upvotes: 3

Views: 1849

Answers (4)

jfs
jfs

Reputation: 414325

If you need to do different variations of this often, a pivot table could be used e.g. pandas.pivot_table():

>>> import numpy as np
>>> import pandas ad pd
>>> df = pd.DataFrame(data, columns=['team', 'person', 'number'])
>>> df
    team           person  number
0  Team1        Mark Owen      40
1  Team1         John Doe      25
2  Team2        Raj Patel      40
3  Team3    Matt Le Blanc      30
4  Team1       Rene Russo      40
5  Team1     Ronald Regan      40
6  Team3    Dean Saunders      15
7  Team2  Michael Antonio      30
>>> pd.pivot_table(df,  index=['team'],
...     aggfunc={'person': lambda s: np.unique(s).size, 'number': np.sum})
       number  person
team                 
Team1     145       4
Team2      70       2
Team3      45       2

Otherwise, defaultdict-based solutions from other answers are enough.

Upvotes: 1

B. Eckles
B. Eckles

Reputation: 1644

There's probably a nice clean way of doing this, but doing it a little less so...

count = {row[0]:sum((1 for _row in data if _row[0] == row[0])) for row in data}
num = {row[0]:sum((_row[2] for _row in data if _row[0] == row[0])) for row in data}

Those generators/comprehensions could probably be written a little less confusingly, but you've already got so many answers to choose from!

Upvotes: 1

Arkady
Arkady

Reputation: 15059

You can do something like this:

from collections import defaultdict
out = defaultdict(dict)
for team, name, num in data:
    out[team].setdefault('count', 0)
    out[team].setdefault('sum', 0)
    out[team]['count'] += 1
    out[team]['sum'] += num

print dict(out)

Result:

{'Team1': {'count': 4, 'sum': 145},
 'Team2': {'count': 2, 'sum': 70},
 'Team3': {'count': 2, 'sum': 45}}

Upvotes: 2

Patrick Haugh
Patrick Haugh

Reputation: 61014

groupby is a function from itertools, but it's not quite what you want. Instead, lets import defaultdict from collections

from collections import defaultdict
def data_by_team(data):
    d = defaultdict(lambda: [0,0])
    for team, name, number in data:
        d[team][0] += 1
        d[team][1] += number
    return d

This returns a defaultdict that maps the names of the team to a list containing the number of players and the sum of their numbers.

Upvotes: 5

Related Questions