Reputation:
have a df with values
df
name marks
mark 10
mark 40
tom 25
tom 20
mark 50
tom 5
tom 50
tom 25
tom 10
tom 15
How to sum the marks of names and count how many times it took
expected_output:
name total count
mark 100 3
tom 150 7
Upvotes: 1
Views: 51
Reputation: 12523
Here's a solution. I'm doing it step-by-step for simplicity:
df["commulative_sum"] = df.groupby("name").cumsum()
df["commulative_sum_50"] = df["commulative_sum"] // 50
df["commulative_count"] = df.assign(one = 1).groupby("name").cumsum()["one"]
res = pd.pivot_table(df, index="name", columns="commulative_sum_50", values="commulative_count", aggfunc=min).drop(0, axis=1)
# the following two lines can be done in a loop if there are a lot of columns. I simplified it here.
res[3] = res[3]-res[2]
res[2] = res[2]-res[1]
res.columns = ["50-" + str(c) for c in res.columns]
The result is:
50-1 50-2 50-3
name
mark 2.0 1.0 NaN
tom 3.0 1.0 3.0
Upvotes: 1
Reputation: 1776
Edit: I see you changed the "to 50" part, this might not be revelant anymore.
I agree with Jezrael's answer, but in case you only want to count to 50, maybe something like this:
l = []
for name, group in df.groupby('name')['marks']:
l.append({'name': name,
'total': group.sum(),
'count': group.loc[:group.eq(50).idxmax()].count()})
pd.DataFrame(l)
name total count
0 mark 100 3
1 tom 150 4
Upvotes: 0
Reputation: 863166
Here is possible use aggregate by named aggregations:
df = df.groupby('name').agg(total=('marks','sum'),
count=('marks','size')).reset_index()
print (df)
name total count
0 mark 100 3
1 tom 150 7
Or with specify column after groupby
and pass tuples:
df = df.groupby('name')['marks'].agg([('total', 'sum'),
('count','size')]).reset_index()
print (df)
name total count
0 mark 100 3
1 tom 150 7
Upvotes: 1