xiaodai
xiaodai

Reputation: 16054

pandas: How to count the unique categories?

I have a dataframe

df_input = pd.DataFrame(
        {
            "col_cate": ['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B'],
            "target_bool": [True, False, True, False, True, False, True, False]
        }
    )

And I want to count the number of unique categories. So I am expecting the output to be like this

col_cate, target_bool, cnt
'A'     , True       , 2
'A'     , False      , 2
'B'     , True       , 2
'B'     , False      , 2

But df_input.group_by(["col_cate", "target_bool"]).count() gives

Empty DataFrame
Columns: []
Index: [(A, False), (A, True), (B, False), (B, True)]

But adding a dummy to the df_input works, like df_input["dummy"] = 1.

How do I get the group by count table without adding a dummy?

Upvotes: 2

Views: 1727

Answers (3)

jezrael
jezrael

Reputation: 863166

Because function GroupBy.count is used for counts values with exclude missing values if exist is necessary specify column after groupby, if both columns are used in by parameter in groupby:

df = (df_input.groupby(by=["col_cate", "target_bool"])['col_cate']
              .count()
              .reset_index(name='cnt'))
print (df)
  col_cate  target_bool  cnt
0        A        False    2
1        A         True    2
2        B        False    2
3        B         True    2

If want count all columns, here both is it possible (but here always same output) if specify both columns:

df1 = (df_input.groupby(["col_cate", "target_bool"])[['col_cate','target_bool']]
               .count()
               .add_suffix('_count')
               .reset_index())
print (df1)
  col_cate  target_bool  col_cate_count  target_bool_count
0        A        False               2                  2
1        A         True               2                  2
2        B        False               2                  2
3        B         True               2                  2

Or if use GroupBy.size method it working a bit different - it count all values, not exclude missing, so no column is necessary specify:

df = df_input.groupby(["col_cate", "target_bool"]).size().reset_index(name='cnt')
print (df)
  col_cate  target_bool  cnt
0        A        False    2
1        A         True    2
2        B        False    2
3        B         True    2

Upvotes: 3

Alice jinx
Alice jinx

Reputation: 625

df_input.groupby('col_cate')['target_bool'].value_counts()

col_cate  target_bool
A         False          2
          True           2
B         False          2
          True           2

then you can reset_index()

Upvotes: 7

Mayank Porwal
Mayank Porwal

Reputation: 34086

Like this also:

In [54]: df_input.groupby(df_input.columns.tolist()).size().reset_index().\ 
    ...:     rename(columns={0:'cnt'}) 
Out[54]: 
  col_cate  target_bool  cnt
0        A        False    2
1        A         True    2
2        B        False    2
3        B         True    2

Upvotes: 1

Related Questions