Reputation: 3640
Let's say I have a DataFrame that looks like this:
Bank Name House This Wk
Barc Germany 100
Barc UK 300
Barc UK 500
JPM Japan 200
JPM NYC 100
BOA LA 900
BOA LA 50
BOA LA 50
DB Italy 45
I would like to group-by Bank Name, while outputting the largest House Value as well as the total value...
For example, using the example above would result in:
Bank Name Total House This Wk
Barc 900 UK 500
JPM 300 Japan 200
BOA 1000 LA 900
DB 45 Italy 45
Essentially, it is grouping the Total
by Bank Name, but also outputting the largest contributor, House
, to the total and the amount contributed is This Wk
.
How can I go about doing this?
Upvotes: 0
Views: 81
Reputation: 402263
You can consider df.groupby
with a list of dfGroupBy.agg
functions:
In [732]: out = df.groupby('Bank Name')['This Wk'].agg(['sum', 'idxmax', 'max'])\
.rename(columns={'sum' : 'Total', 'idxmax' : 'House', 'max' : 'This Wk'})\
.reset_index()
In [734]: out['House'] = df.loc[out['House'], 'House'].values; out
Out[734]:
Bank Name Total House This Wk
0 BOA 1000 LA 900
1 Barc 900 UK 500
2 DB 45 Italy 45
3 JPM 300 Japan 200
Upvotes: 3
Reputation: 76917
Another way using apply
would be
In [17]: (df.groupby('Bank Name', sort=False)
.apply(lambda x: pd.Series(
[x['This Wk'].sum(),
x.loc[x['This Wk'].idxmax(), 'House'],
x['This Wk'].max()],
index=['Total', 'House', 'This Wk']))
.reset_index())
Out[17]:
Bank Name Total House This Wk
0 Barc 900 UK 500
1 JPM 300 Japan 200
2 BOA 1000 LA 900
3 DB 45 Italy 45
Upvotes: 0
Reputation: 210822
In [121]: df.groupby('Bank Name', group_keys=False) \
...: .apply(lambda x: x.nlargest(1, 'This Wk').assign(Total=x['This Wk'].sum())) \
...: [['Bank Name','Total','House','This Wk']]
...:
Out[121]:
Bank Name Total House This Wk
5 BOA 1000 LA 900
2 Barc 900 UK 500
8 DB 45 Italy 45
3 JPM 300 Japan 200
Upvotes: 5