Reputation: 77
this is my dataframe
df = pd.DataFrame([
('a', 0, 0),
('b', 1, 1),
('c', 1, 0),
('d', 2, 1),
('e', 2, 1)
], columns=['name', 'cluster', 'is_selected'])
i want to count each letter selected in each cluster and group by cluster.
i tried this :
df.groupby('cluster')['is_selected'].value_counts()
and i get this output :
cluster is_selected
0 0 1
1 0 1
1 1
2 1 2
Name: is_selected, dtype: int64
but what i want is this format:
cluster count_selected
0 1
1 1
2 2
please how can i fix it?
Upvotes: 0
Views: 1608
Reputation: 148870
This should give the expected output:
df.where(df['is_selected'] == 1).groupby('cluster')['is_selected'].count().rename(
'count_selected').reindex(df['cluster'].drop_duplicates()).fillna(0).astype(int).reset_index()
Upvotes: 1
Reputation: 515
Based on your explanation you want to count the letters that are selected (value of 1 in is_selected
) grouped by clusters.
if that's what you're looking for then this should help:
df[df.is_selected == 1].groupby(['cluster'])['name'].count().reset_index(name='count_selected')
The output is a little different but then again I'm not entirely sure what would cause your cluster 0 to have a count of 1 in your expected output, so i hope this is it!
output:
cluster count_selected
0 1 1
1 2 2
Upvotes: 1