Reputation: 159
I have a list of tuples:
data = [('Team1', 'Mark Owen', 40),
('Team1', 'John Doe', 25),
('Team2', 'Raj Patel', 40),
('Team3', 'Matt Le Blanc', 30),
('Team1', 'Rene Russo', 40),
('Team1', 'Ronald Regan', 40),
('Team3', 'Dean Saunders', 15),
('Team2', 'Michael Antonio', 30)]
I would like to groupby Team (index[0] of each tuple), count the number or persons in each team (index[1]) and sum the numbers related to each Team (index[2]) but I cannot quite figure this out. So far I have tried using defaultdict(list) which returns a dict, for example I've tried this to group by Team:
def create_hrs_totals():
result = defaultdict(list)
for k, *v in data():
result[k] += v
return dict(result)
but then I am struggling with working with that output to achieve what I need using a list comp or whatever... The result I am looking for is a new list:
[Team1, 4, 145,
Team2, 2, 80,
Team3, 2, 70]
Is there a better way of doing this?
Upvotes: 3
Views: 1849
Reputation: 414325
If you need to do different variations of this often, a pivot table could be used e.g. pandas.pivot_table()
:
>>> import numpy as np
>>> import pandas ad pd
>>> df = pd.DataFrame(data, columns=['team', 'person', 'number'])
>>> df
team person number
0 Team1 Mark Owen 40
1 Team1 John Doe 25
2 Team2 Raj Patel 40
3 Team3 Matt Le Blanc 30
4 Team1 Rene Russo 40
5 Team1 Ronald Regan 40
6 Team3 Dean Saunders 15
7 Team2 Michael Antonio 30
>>> pd.pivot_table(df, index=['team'],
... aggfunc={'person': lambda s: np.unique(s).size, 'number': np.sum})
number person
team
Team1 145 4
Team2 70 2
Team3 45 2
Otherwise, defaultdict
-based solutions from other answers are enough.
Upvotes: 1
Reputation: 1644
There's probably a nice clean way of doing this, but doing it a little less so...
count = {row[0]:sum((1 for _row in data if _row[0] == row[0])) for row in data}
num = {row[0]:sum((_row[2] for _row in data if _row[0] == row[0])) for row in data}
Those generators/comprehensions could probably be written a little less confusingly, but you've already got so many answers to choose from!
Upvotes: 1
Reputation: 15059
You can do something like this:
from collections import defaultdict
out = defaultdict(dict)
for team, name, num in data:
out[team].setdefault('count', 0)
out[team].setdefault('sum', 0)
out[team]['count'] += 1
out[team]['sum'] += num
print dict(out)
Result:
{'Team1': {'count': 4, 'sum': 145},
'Team2': {'count': 2, 'sum': 70},
'Team3': {'count': 2, 'sum': 45}}
Upvotes: 2
Reputation: 61014
groupby
is a function from itertools
, but it's not quite what you want. Instead, lets import defaultdict
from collections
from collections import defaultdict
def data_by_team(data):
d = defaultdict(lambda: [0,0])
for team, name, number in data:
d[team][0] += 1
d[team][1] += number
return d
This returns a defaultdict
that maps the names of the team to a list containing the number of players and the sum of their numbers.
Upvotes: 5