Chris90
Chris90

Reputation: 1998

Calculating grouped by % based on if there are contained values in numerator and unique column value in denominator

I am trying to compute a ratio or % that takes the number of occurrences of a grouped by column (Service Column) that has at least one of two possible values (Food or Beverage) and then divide it over the number of unique column (Business Column) values in the df but am having trouble.

Original df:

Rep      | Business | Service
Cindy    Shakeshake    Food
Cindy    Shakeshake    Outdoor
Kim      BurgerKing    Beverage
Kim      Burgerking    Phone
Kim      Burgerking    Car
Nate     Tacohouse     Food
Nate     Tacohouse     Car
Tim      Cofeeshop     Coffee
Tim      Coffeeshop    Seating
Cindy    Italia        Seating
Cindy    Italia        Coffee



 Desired Output:
  Rep    | %
  Cindy    .5
  Kim       1
  Nate      1
  Tim       0

Where % is the number of Businesses cindy has with at least 1 Food or Beverage row divided by all unique Businesses in df for her.

I am trying something like below:

     (df.assign(Service=df.Service.isin(['Food','Beverage']).astype(int))
       .groupby('Rep')
       .agg({'Business':'nunique','Service':'count'}))

s['Service']/s['Business']

But this doesnt give me what im looking for as the service only gives all rows in df for cindy in this case 4 and the Businees column isnt giving me an accurate # of where she has food or beverage in a grouped by business.

Thanks for looking and possible help in advance.

Upvotes: 1

Views: 53

Answers (2)

jezrael
jezrael

Reputation: 863146

I you think you need aggregate sum for count matched values:

df1 = (df.assign(Service=df.Service.isin(['Food','Beverage']).astype(int))
       .groupby('Rep')
       .agg({'Business':'nunique','Service':'sum'}))
print (df1)
       Business  Service
Rep                     
Cindy         2        1
Kim           2        1
Nate          1        1
Tim           2        0

s = df1['Service']/df1['Business']
print (s)
Cindy    0.5
Kim      0.5
Nate     1.0
Tim      0.0
dtype: float64

Upvotes: 1

cy9
cy9

Reputation: 16

There is a small mistake that you made in your code here:

s=(df.assign(Service=df.Service.isin(['Food','Beverage']).astype(int))
       .groupby('Rep')
       .agg({'Business':'nunique','Service':'count'}))

s['Service']/s['Business']

You would need to change 'Service':'count' to 'Service':'sum'. count just counts the number of rows that each Rep has. With sum, it counts the number of rows that each Rep has that is either Food or Beverage service.

Upvotes: 0

Related Questions