Chris90
Chris90

Reputation: 1998

Calculating Percentage of True or False of multiple columns grouped by a different column

I have a df such as:

 Service |  DoorDash | Grubhub / Seamless | UberEats| Caviar | Postmates | JustEat | Deliveroo | Foodora | Grab | Talabat | Tock      
 Open         True          True                True     False    True        False     False      False     False  False     False
 Closed       True          False               True     True     False       False     False      False     False  False     True
 Open         True          False               False    False    False       False     False      False     False  False     False
 Open         False         False               False    False    False       False     False      False     False  False     False
 Closed       False         False               False    True     False       False     False      False     False  False     True
 Open         False         False               False    False    False       False     False      False     False  False     False
 Closed       False         False               False    False    False       False     False      False     False  False     True  
 Closed       True          True                True     False    True        False     False      False     False  False     False
 Open         False         True                True     False    False       False     False      False     False  False     False

I want to run a .value_counts(normalize=True) on the all the columns not named Service but grouped by the service column, so my output will be the % that DoorDash and the rest of the columns are True and False but grouped or filtered for Open or Closed in the Service Column.

Expected Output:

    DoorDash Grubhub / Seamless ........
Open   .60        .50
Closed .40        .50

But for all of the columns with a True or False value not just DoorDash, but I cant seem to find a way where I can do that calculation and my output will be for all the columns ?

Thanks

Upvotes: 1

Views: 176

Answers (1)

jezrael
jezrael

Reputation: 863116

You can use DataFrame.melt with SeriesGroupBy.value_counts:

df = (df.melt('Service')
        .groupby(['Service', 'variable'])['value']
        .value_counts(normalize=True)
        .unstack(1, fill_value=0))
print (df)
variable       Caviar  Deliveroo  DoorDash  Foodora  Grab  Grubhub / Seamless  \
Service value                                                                   
Closed  False     0.5        1.0       0.5      1.0   1.0                0.75   
        True      0.5        0.0       0.5      0.0   0.0                0.25   
Open    False     1.0        1.0       0.6      1.0   1.0                0.60   
        True      0.0        0.0       0.4      0.0   0.0                0.40   

variable       JustEat  Postmates  Talabat  Tock  UberEats  
Service value                                               
Closed  False      1.0       0.75      1.0  0.25       0.5  
        True       0.0       0.25      0.0  0.75       0.5  
Open    False      1.0       0.80      1.0  1.00       0.6  
        True       0.0       0.20      0.0  0.00       0.4  

Upvotes: 1

Related Questions