Reputation: 133
I have the following Pandas dataframe:
Name | EventSignupNo | Attended | Points
Smith | 0145 | Y | 20.24
Smith | 0174 | Y | 29.14
Smith | 0239 | N | 0
Adams | 0145 | N | 0
Adams | 0174 | Y | 33.43
Morgan | 0239 | Y | 31.23
Morgan | 0244 | Y | 23.15
and what I'd like is a count of the number of events attended and not attended per person, and the sum of their points, per person. So I do a groupby: df.groupby([Name, Attended]).agg({"Attended": "count", "Points": "sum"}).rename(columns = {"Attended: "Count"}).reset_index()
which would give me something like:
Name | Attended | Count | Points
Smith | Y | 2 | 49.38
Smith | N | 1 | 0
Adams | Y | 1 | 33.43
Adams | N | 1 | 0
Morgan | Y | 2 | 54.38
but I'd want something like:
Name | Attended | Count | Points
Smith | Y | 2 | 49.38
Smith | N | 1 | 0
Adams | Y | 1 | 33.43
Adams | N | 1 | 0
Morgan | Y | 2 | 54.38
Morgan | N | 0 | 0
I tried playing around with pd.MultiIndex to try to fill the missing zero count, but to no avail. I've read the other similar questions but I'm having trouble dealing with the continuous Points column using MultiIndex. Any idea how to do this?
Upvotes: 2
Views: 5370
Reputation: 402553
You could do this with groupby
+ agg
. For your exact output with Y
and N
at each level, you'd need reindex
:
g = df.groupby(['Name', 'Attended'], sort=False).Points.agg(['count', 'sum'])
g
count sum
Name Attended
Smith Y 2 49.38
N 1 0.00
Adams N 1 0.00
Y 1 33.43
Morgan Y 2 54.38
idx = pd.MultiIndex.from_product([g.index.levels[0], ['Y', 'N']])
idx
MultiIndex(levels=[['Adams', 'Morgan', 'Smith'], ['N', 'Y']],
labels=[[2, 2, 0, 0, 1, 1], [1, 0, 1, 0, 1, 0]])
g.reindex(idx, fill_value=0)
count sum
Smith Y 2 49.38
N 1 0.00
Adams Y 1 33.43
N 1 0.00
Morgan Y 2 54.38
N 0 0.00
Upvotes: 3