Chris90
Chris90

Reputation: 1998

Error on value column in group by value counts

I have code below as:

df[('name')]['cash_amount'].valuecounts(normalize=True).sum()

I want to use valuecounts normalize true, because I want to calculate the % of each names cash over the total amount of cash in the column.

Where I am trying to calculate the total number each name has in the cash_amount column, but I get error that says - KeyError: 'cash_amount'

df looks like

input:

name | cash_amount
bob      $400
chris    $500
amy      $100
amy      $100
bob      $100
bob      $100

output:

name |  % 
bob    .46
chris  .38
amy    .15

I looked for any white spaces in the column name and tried df.columns = df.columns.str.strip() and still get same error

Upvotes: 1

Views: 55

Answers (2)

wwnde
wwnde

Reputation: 26676

Please use df.replace, groupby() and apply lambda grouped sum divided by total sum

 df['cash_amount']=df.replace(regex=r'\$', value='')['cash_amount'].astype(int)
(df.groupby('name').cash_amount.apply(lambda x: x.sum())/df.cash_amount.sum()).rename('%').reset_index()


 name         %
0    amy  0.153846
1    bob  0.461538
2  chris  0.384615

Upvotes: 1

David Erickson
David Erickson

Reputation: 16683

First remove the $ from your string and convert to float or int. $ is a regex character, so you need to escape it with \. Then .groupby and get the percentage of total by getting the sum for each group and dividing it by the toal sum:

import pandas as pd
df = pd.DataFrame({'name': {0: 'bob', 1: 'chris', 2: 'amy', 3: 'amy', 4: 'bob', 5: 'bob'},
 'cash_amount': {0: '$400', 1: '$500', 2: '$100', 3: '$100', 4: '$100', 5: '$100'}})
df['cash_amount'] = df['cash_amount'].str.replace('\$', '').astype(float)
df = ((df.groupby('name')['cash_amount'].sum() / df['cash_amount'].sum())
      .rename('%').reset_index())
df
Out[1]: 
    name         %
0    amy  0.153846
1    bob  0.461538
2  chris  0.384615

Upvotes: 1

Related Questions