Reputation: 1511
I have the following pandas dataframe (I am using python):
import pandas as pd
df = pd.DataFrame({
'date': ['2020-07-01', '2020-07-01', '2020-07-01', '2020-07-01', '2020-07-02', '2020-07-02', '2020-07-02'],
'value': ['abc', 'abc', 'abc', 'bb', 'bb', 'bb', 'c']})
df.head(7)
date value
0 2020-07-01 abc
1 2020-07-01 abc
2 2020-07-01 abc
3 2020-07-01 bb
4 2020-07-02 bb
5 2020-07-02 bb
6 2020-07-02 c
How can I get this:
date value count
0 2020-07-01 abc 3
1 2020-07-01 bb 1
2 2020-07-02 bb 2
3 2020-07-02 c 1
or this:
date value count
0 2020-07-01 abc 3
bb 1
1 2020-07-02 bb 2
c 1
Both solutions work equally fine for me.
Upvotes: 1
Views: 2162
Reputation: 26676
Use value_counts
in an .agg()
function
df.groupby('date').agg(count=('value', 'value_counts')).reset_index()
date value count
0 2020-07-01 abc 3
1 2020-07-01 bb 1
2 2020-07-02 bb 2
3 2020-07-02 c 1
Upvotes: 1
Reputation: 323226
We can do value_counts
s=df.groupby('date').value.value_counts().to_frame('count')
count
date value
2020-07-01 abc 3
bb 1
2020-07-02 bb 2
c 1
Upvotes: 1
Reputation: 3598
Try this:
df2 = df.groupby(by = ['date','value'])['value'].count().to_frame(name='count')
result:
count
date value
2020-07-01 abc 3
bb 1
2020-07-02 bb 2
c 1
or:
df2 = df.groupby(by = ['date','value'])['value'].count().to_frame(name='count').reset_index()
which gives:
date value count
0 2020-07-01 abc 3
1 2020-07-01 bb 1
2 2020-07-02 bb 2
3 2020-07-02 c 1
Upvotes: 3