Reputation: 391
I am trying to get the percent of values in a column based on a list of unique values in another column. My dataframe has the following structure:
property_state_code | converted
--------------------------------
NY converted
TX converted
TX Not Converted
CA Not Converted
MO converted
The results I want would be something like:
states | conversion_pct
-----------------------
NY 1
TX .5
CA 0
MO 1
My code thus far is below. The error I am getting is
"The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
In the line where I am trying to do the pct calculation (pct = ...). I am not sure where in that line this error is occurring, so any insight or help would be appreciated!
states = []
for val in results.property_state_code:
if val not in states:
states.append(val)
print(states)
conversion_pct = []
for state in states:
if results['property_state_code'] == state:
pct = (results['converted'].value_counts()['converted']) / ((results['converted'].value_counts()['converted']) + (results['converted'].value_counts()['Not Converted']))
conversion_pct.append(pct)
Upvotes: 0
Views: 41
Reputation: 262634
Use a custom groupby.agg
:
out = (
df['converted'].eq('converted')
.groupby(df['property_state_code'].rename('states'), sort=False)
.mean().reset_index(name='conversion_pct')
)
Output:
states conversion_pct
0 NY 1.0
1 TX 0.5
2 CA 0.0
3 MO 1.0
Upvotes: 1