Reputation: 1991
Afternoon All,
I have a very large dataset which I have groupedby. Here is a sample :
df_ccy = df.groupby(['currency_str','state'
['state'].count().reset_index(name='count').sort_values(['count'], ascending=False)
display(df_ccy)
Output:
currency_str state count
USD Traded Away 148
AUD Dealer Timeout 52
CAD Done 44
USD Covered 38
USD Dealer Timeout 29
ZAR Done 22
I would like to only show:
CAD Done 44
ZAR Done 22
I achieved this via:
display(df_ccy [df_ccy ['state']=='Done'][['currency_str','state','count']])
Should I use a Lambda function on the original groupby statement or filter as I have done above? What is best practice?
Upvotes: 1
Views: 693
Reputation: 863301
df = pd.DataFrame({'currency_str': ['USD', 'AUD', 'CAD', 'CAD', 'ZAR',
'ZAR', 'USD', 'USD', 'ZAR'],
'state': ['Traded Away', 'Dealer Timeout', 'Done', 'Done', 'Done',
'Done', 'Covered', 'Dealer Timeout', 'Done']})
print (df)
currency_str state
0 USD Traded Away
1 AUD Dealer Timeout
2 CAD Done
3 CAD Done
4 ZAR Done
5 ZAR Done
6 USD Covered
7 USD Dealer Timeout
8 ZAR Done
I think you need filter first:
df1 = df[df['state']=='Done']
#alternative
#df1 = df.query("state == 'Done'")
And then count:
df_ccy = (df1.groupby(['currency_str','state'])['state']
.count()
.reset_index(name='count')
.sort_values(['count'], ascending=False))
print (df_ccy)
currency_str state count
1 ZAR Done 3
0 CAD Done 2
Or if is not important column state with same filtered value:
df_ccy = (df1['currency_str'].value_counts()
.reset_index(name='count')
.rename(columns={'index':'currency_str'}))
print (df_ccy)
currency_str count
0 ZAR 3
1 CAD 2
Upvotes: 1