How to create summary table of unique value counts?

Question

I've scoured many, many other SO posts for an answer to this question, but haven't found quite what I'm looking for. Here goes:

Let's say we have a dataframe that looks like this:

In [7]: df.head(5)
Out[7]:
  bool_flag   group  int_flag
0     False  bottom         0
1     False     mid         1
2     False     top         1
3     False     top         0
4     False    high         1

Where there are five unique groups, two unique boolean values, and two unique integer values. I'd like to create a summary table like this:

                 bottom   low  mid  high  top
bool_flag  true       5     32   2    12    4
          false       2     42   7     2   10
int_flag      0       1     10  15     3    8 
              1      10     31  14     0    1

summarizing the unique value counts of each of the non-group columns, and grouped in columns of group.

I've gotten close. The following pivot_table command get me tables that resemble components of what I'd like to have.

In [8]: pd.pivot_table(df.drop('bool_flag', axis=1), columns=['group'], index=['int_flag'], aggfunc=len)
Out[8]:
group     bottom  high  low  mid  top
int_flag
0             15    11    8   13   13
1             12     5    8    9    6


In [9]: pd.pivot_table(df.drop('int_flag', axis=1), columns=['group'], index=['bool_flag'], aggfunc=len)
Out[9]:
group      bottom  high  low  mid  top
bool_flag
False          19    14   15   18   16
True            8     2    1    4    3

However, the index of the resulting table isn't the Multiindex I'd like to have, and thus makes concatenating that pivot table with the same for the bool_flag more difficult.

I would hope that there's a way to either use groupby or pivot_table to get what I want without generating these sub-tabulations and concatenating them, but so far I haven't been able to find it. Pivoting with multiple index columns results in too fine-grained a table (I don't want the count of (False, 0) pairs for (bool_flag, int_flag) values, for example, just the count of each unique value within each group.)

I also attempted to use groupby('group').agg(f), where I defined f to yield the result of calling value_counts() on each series. However, agg isn't compatible with functions that return DataFrames.

Any suggestions would be greatly appreciated!

How to create summary table of unique value counts?

Answers (1)

Related Questions