Reputation: 1998
I have a df such as:
Service | DoorDash | Grubhub / Seamless | UberEats| Caviar | Postmates | JustEat | Deliveroo | Foodora | Grab | Talabat | Tock
Open True True True False True False False False False False False
Closed True False True True False False False False False False True
Open True False False False False False False False False False False
Open False False False False False False False False False False False
Closed False False False True False False False False False False True
Open False False False False False False False False False False False
Closed False False False False False False False False False False True
Closed True True True False True False False False False False False
Open False True True False False False False False False False False
I want to run a .value_counts(normalize=True)
on the all the columns not named Service but grouped by the service column, so my output will be the % that DoorDash and the rest of the columns are True and False but grouped or filtered for Open or Closed in the Service Column.
Expected Output:
DoorDash Grubhub / Seamless ........
Open .60 .50
Closed .40 .50
But for all of the columns with a True or False value not just DoorDash, but I cant seem to find a way where I can do that calculation and my output will be for all the columns ?
Thanks
Upvotes: 1
Views: 176
Reputation: 863116
You can use DataFrame.melt
with SeriesGroupBy.value_counts
:
df = (df.melt('Service')
.groupby(['Service', 'variable'])['value']
.value_counts(normalize=True)
.unstack(1, fill_value=0))
print (df)
variable Caviar Deliveroo DoorDash Foodora Grab Grubhub / Seamless \
Service value
Closed False 0.5 1.0 0.5 1.0 1.0 0.75
True 0.5 0.0 0.5 0.0 0.0 0.25
Open False 1.0 1.0 0.6 1.0 1.0 0.60
True 0.0 0.0 0.4 0.0 0.0 0.40
variable JustEat Postmates Talabat Tock UberEats
Service value
Closed False 1.0 0.75 1.0 0.25 0.5
True 0.0 0.25 0.0 0.75 0.5
Open False 1.0 0.80 1.0 1.00 0.6
True 0.0 0.20 0.0 0.00 0.4
Upvotes: 1