PParker
PParker

Reputation: 1511

Group by date and count values in pandas dataframe

I have the following pandas dataframe (I am using python):

import pandas as pd

df = pd.DataFrame({
'date': ['2020-07-01', '2020-07-01', '2020-07-01', '2020-07-01', '2020-07-02', '2020-07-02', '2020-07-02'],
'value': ['abc', 'abc', 'abc', 'bb', 'bb', 'bb', 'c']})

df.head(7)

    date    value
0   2020-07-01  abc
1   2020-07-01  abc
2   2020-07-01  abc
3   2020-07-01  bb
4   2020-07-02  bb
5   2020-07-02  bb
6   2020-07-02  c

How can I get this:

    date    value  count
0   2020-07-01  abc  3
1   2020-07-01  bb   1
2   2020-07-02  bb   2
3   2020-07-02  c    1

or this:

    date    value  count
0   2020-07-01  abc  3
                bb   1
1   2020-07-02  bb   2
                c    1

Both solutions work equally fine for me.

Upvotes: 1

Views: 2162

Answers (3)

wwnde
wwnde

Reputation: 26676

Use value_counts in an .agg() function

df.groupby('date').agg(count=('value', 'value_counts')).reset_index()



     date value  count
0  2020-07-01   abc      3
1  2020-07-01    bb      1
2  2020-07-02    bb      2
3  2020-07-02     c      1

Upvotes: 1

BENY
BENY

Reputation: 323226

We can do value_counts

s=df.groupby('date').value.value_counts().to_frame('count')
                  count
date       value       
2020-07-01 abc        3
           bb         1
2020-07-02 bb         2
           c          1

Upvotes: 1

ipj
ipj

Reputation: 3598

Try this:

df2 = df.groupby(by = ['date','value'])['value'].count().to_frame(name='count')

result:

                  count
date       value       
2020-07-01 abc        3
           bb         1
2020-07-02 bb         2
           c          1

or:

df2 = df.groupby(by = ['date','value'])['value'].count().to_frame(name='count').reset_index()

which gives:

         date value  count
0  2020-07-01   abc      3
1  2020-07-01    bb      1
2  2020-07-02    bb      2
3  2020-07-02     c      1

Upvotes: 3

Related Questions