FrenchConnections
FrenchConnections

Reputation: 391

Calculate percent of values based on column in dataframe

I am trying to get the percent of values in a column based on a list of unique values in another column. My dataframe has the following structure:

property_state_code | converted
   --------------------------------
       NY                converted
       TX                converted
       TX                Not Converted
       CA                Not Converted
       MO                converted

The results I want would be something like:

states | conversion_pct
    -----------------------
      NY        1
      TX        .5
      CA        0
      MO        1

My code thus far is below. The error I am getting is

"The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

In the line where I am trying to do the pct calculation (pct = ...). I am not sure where in that line this error is occurring, so any insight or help would be appreciated!

states = []
for val in results.property_state_code:
  if val not in states:
    states.append(val)
print(states)

conversion_pct = []
for state in states:
  if results['property_state_code'] == state:
    pct = (results['converted'].value_counts()['converted']) / ((results['converted'].value_counts()['converted']) + (results['converted'].value_counts()['Not Converted']))
    conversion_pct.append(pct)

Upvotes: 0

Views: 41

Answers (1)

mozway
mozway

Reputation: 262634

Use a custom groupby.agg:

out = (
 df['converted'].eq('converted')
  .groupby(df['property_state_code'].rename('states'), sort=False)
  .mean().reset_index(name='conversion_pct')
)

Output:


  states  conversion_pct
0     NY             1.0
1     TX             0.5
2     CA             0.0
3     MO             1.0

Upvotes: 1

Related Questions