Celongar
Celongar

Reputation: 73

Pandas: Grouping a dataset Along Columns and Rows

I have a dataset that looks sort of like:

ID| Hungry? | Height  |Outcome | other1 | other2 | other3
1    yes     tall      good
2    no      short     bad 
3    yes     medium     okay 
4    yes     tall       bad  
5    no     tall      good
5    no     short      good
5    no     short      good

I want to be able to arbitrarily produce two very similar outputs that aggregate the counts of each outcome divided according to the categorical or yes/no traits:

For categorical columns:

       |short|medium|tall|All
bad       1    0     1     2
okay      0    1     0     1
good      2    0     1     3

For yes/no columns:

      |hungry|nothungry|All
bad       1     1         2
okay      1     0         1
good      1     3         4

This is the same as doing df.values_count('Outcome'), but divided additionally along columns. I have tried doing this with pivot_table, but I can't figure out how to make it come out correct when I'm doing a row count rather than aggregating a pre-existing quantitative value. It ends up using all of the columns from the original dataset.

The closest I have come is

df.pivot_table(index='Outcome',columns='Height',aggfunc='count',values='Other1',margins=True)

In that case, Other1 is an arbitrary column, which seems like a kludge. Also this works correctly for my categorical columns, like Height, but I'm not sure how I can extend it to the yes/no ones.

Is there a simpler way to do this?

Upvotes: 0

Views: 38

Answers (1)

Dani Mesejo
Dani Mesejo

Reputation: 61920

You could use pandas.crosstab:

result = pd.crosstab(index=df.Outcome,columns=df.Height, margins=True)
print(result)

Output

Height   medium  short  tall  All
Outcome                          
bad           0      1     1    2
good          0      2     2    4
okay          1      0     0    1
All           1      3     3    7

The same applies for the Hungry column:

result = pd.crosstab(index=df.Outcome,columns=df.Hungry, margins=True)
print(result)

Output

Hungry   no  yes  All
Outcome              
bad       1    1    2
good      3    1    4
okay      0    1    1
All       4    3    7

Upvotes: 1

Related Questions