Reputation: 16054
I have a dataframe
df_input = pd.DataFrame(
{
"col_cate": ['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B'],
"target_bool": [True, False, True, False, True, False, True, False]
}
)
And I want to count the number of unique categories. So I am expecting the output to be like this
col_cate, target_bool, cnt
'A' , True , 2
'A' , False , 2
'B' , True , 2
'B' , False , 2
But df_input.group_by(["col_cate", "target_bool"]).count()
gives
Empty DataFrame
Columns: []
Index: [(A, False), (A, True), (B, False), (B, True)]
But adding a dummy
to the df_input
works, like df_input["dummy"] = 1
.
How do I get the group by count table without adding a dummy?
Upvotes: 2
Views: 1727
Reputation: 863166
Because function GroupBy.count
is used for counts values with exclude missing values if exist is necessary specify column after groupby, if both columns are used in by
parameter in groupby
:
df = (df_input.groupby(by=["col_cate", "target_bool"])['col_cate']
.count()
.reset_index(name='cnt'))
print (df)
col_cate target_bool cnt
0 A False 2
1 A True 2
2 B False 2
3 B True 2
If want count all columns, here both is it possible (but here always same output) if specify both columns:
df1 = (df_input.groupby(["col_cate", "target_bool"])[['col_cate','target_bool']]
.count()
.add_suffix('_count')
.reset_index())
print (df1)
col_cate target_bool col_cate_count target_bool_count
0 A False 2 2
1 A True 2 2
2 B False 2 2
3 B True 2 2
Or if use GroupBy.size
method it working a bit different - it count all values, not exclude missing, so no column is necessary specify:
df = df_input.groupby(["col_cate", "target_bool"]).size().reset_index(name='cnt')
print (df)
col_cate target_bool cnt
0 A False 2
1 A True 2
2 B False 2
3 B True 2
Upvotes: 3
Reputation: 625
df_input.groupby('col_cate')['target_bool'].value_counts()
col_cate target_bool
A False 2
True 2
B False 2
True 2
then you can reset_index()
Upvotes: 7
Reputation: 34086
Like this also:
In [54]: df_input.groupby(df_input.columns.tolist()).size().reset_index().\
...: rename(columns={0:'cnt'})
Out[54]:
col_cate target_bool cnt
0 A False 2
1 A True 2
2 B False 2
3 B True 2
Upvotes: 1