Reputation: 73
I have a dataset that looks sort of like:
ID| Hungry? | Height |Outcome | other1 | other2 | other3
1 yes tall good
2 no short bad
3 yes medium okay
4 yes tall bad
5 no tall good
5 no short good
5 no short good
I want to be able to arbitrarily produce two very similar outputs that aggregate the counts of each outcome divided according to the categorical or yes/no traits:
For categorical columns:
|short|medium|tall|All
bad 1 0 1 2
okay 0 1 0 1
good 2 0 1 3
For yes/no columns:
|hungry|nothungry|All
bad 1 1 2
okay 1 0 1
good 1 3 4
This is the same as doing df.values_count('Outcome'), but divided additionally along columns. I have tried doing this with pivot_table, but I can't figure out how to make it come out correct when I'm doing a row count rather than aggregating a pre-existing quantitative value. It ends up using all of the columns from the original dataset.
The closest I have come is
df.pivot_table(index='Outcome',columns='Height',aggfunc='count',values='Other1',margins=True)
In that case, Other1 is an arbitrary column, which seems like a kludge. Also this works correctly for my categorical columns, like Height, but I'm not sure how I can extend it to the yes/no ones.
Is there a simpler way to do this?
Upvotes: 0
Views: 38
Reputation: 61920
You could use pandas.crosstab:
result = pd.crosstab(index=df.Outcome,columns=df.Height, margins=True)
print(result)
Output
Height medium short tall All
Outcome
bad 0 1 1 2
good 0 2 2 4
okay 1 0 0 1
All 1 3 3 7
The same applies for the Hungry
column:
result = pd.crosstab(index=df.Outcome,columns=df.Hungry, margins=True)
print(result)
Output
Hungry no yes All
Outcome
bad 1 1 2
good 3 1 4
okay 0 1 1
All 4 3 7
Upvotes: 1