Reputation: 1998
I have code below as:
df[('name')]['cash_amount'].valuecounts(normalize=True).sum()
I want to use valuecounts normalize true, because I want to calculate the % of each names cash over the total amount of cash in the column.
Where I am trying to calculate the total number each name has in the cash_amount column, but I get error that says - KeyError: 'cash_amount'
df looks like
input:
name | cash_amount
bob $400
chris $500
amy $100
amy $100
bob $100
bob $100
output:
name | %
bob .46
chris .38
amy .15
I looked for any white spaces in the column name and tried df.columns = df.columns.str.strip()
and still get same error
Upvotes: 1
Views: 55
Reputation: 26676
Please use df.replace
, groupby()
and apply lambda grouped sum divided by total sum
df['cash_amount']=df.replace(regex=r'\$', value='')['cash_amount'].astype(int)
(df.groupby('name').cash_amount.apply(lambda x: x.sum())/df.cash_amount.sum()).rename('%').reset_index()
name %
0 amy 0.153846
1 bob 0.461538
2 chris 0.384615
Upvotes: 1
Reputation: 16683
First remove the $
from your string and convert to float
or int
. $
is a regex character, so you need to escape it with \
. Then .groupby
and get the percentage of total by getting the sum for each group and dividing it by the toal sum:
import pandas as pd
df = pd.DataFrame({'name': {0: 'bob', 1: 'chris', 2: 'amy', 3: 'amy', 4: 'bob', 5: 'bob'},
'cash_amount': {0: '$400', 1: '$500', 2: '$100', 3: '$100', 4: '$100', 5: '$100'}})
df['cash_amount'] = df['cash_amount'].str.replace('\$', '').astype(float)
df = ((df.groupby('name')['cash_amount'].sum() / df['cash_amount'].sum())
.rename('%').reset_index())
df
Out[1]:
name %
0 amy 0.153846
1 bob 0.461538
2 chris 0.384615
Upvotes: 1