pragmatic learner
pragmatic learner

Reputation: 457

How to count by time frequency using groupby - pandas

I'm trying to count a frequency of 2 events by the month using 2 columns from my df. What I have done so far has counted all events by the unique time which is not efficient enough as there are too many results. I wish to create a graph with the results afterwards.

I've tried adapting my code by the answers on the SO questions:

but can not seem to get the command working when I input freq='day' within the groupby command.

My code is:

print(df.groupby(['Priority', 'Create Time']).Priority.count())

which initially produced something like 170000 results in the structure of the following:

Priority  Create Time        
1.0       2011-01-01 00:00:00    1
          2011-01-01 00:01:11    1
          2011-01-01 00:02:10    1
                  ...

2.0       2011-01-01 00:01:25    1
          2011-01-01 00:01:35    1
                  ...

But now for some reason (I'm using Jupyter Notebook) it only produces:

Priority  Create Time        
1.0       2011-01-01 00:00:00    1
          2011-01-01 00:01:11    1
          2011-01-01 00:02:10    1
2.0       2011-01-01 00:01:25    1
          2011-01-01 00:01:35    1
Name: Priority, dtype: int64

No idea why the output has changed to only 5 results (maybe I unknowingly changed something).

I would like the results to be in the following format:

Priority  month     Count     
1.0       2011-01     a
          2011-02     b
          2011-03     c
                ...

2.0       2011-01     x
          2011-02     y
          2011-03     z
                ...

Top points for showing how to change the frequency correctly for other values as well, for example hour/day/month/year. With the answers please could you explain what is going on in your code as I am new and learning pandas and wish to understand the process. Thank you.

Upvotes: 2

Views: 829

Answers (1)

jezrael
jezrael

Reputation: 863741

One possible solution is convert datetime column to months periods by Series.dt.to_period:

print(df.groupby(['Priority', df['Create Time'].dt.to_period('m')]).Priority.count())

Or use Grouper:

print(df.groupby(['Priority', pd.Grouper(key='Create Time', freq='MS')]).Priority.count())

Sample:

np.random.seed(123)

df = pd.DataFrame({'Create Time':pd.date_range('2019-01-01', freq='10D', periods=10),
                   'Priority':np.random.choice([0,1], size=10)})

print (df)
  Create Time  Priority
0  2019-01-01         0
1  2019-01-11         1
2  2019-01-21         0
3  2019-01-31         0
4  2019-02-10         0
5  2019-02-20         0
6  2019-03-02         0
7  2019-03-12         1
8  2019-03-22         1
9  2019-04-01         0

print(df.groupby(['Priority', df['Create Time'].dt.to_period('m')]).Priority.count())
Priority  Create Time
0         2019-01        3
          2019-02        2
          2019-03        1
          2019-04        1
1         2019-01        1
          2019-03        2
Name: Priority, dtype: int64

print(df.groupby(['Priority', pd.Grouper(key='Create Time', freq='MS')]).Priority.count())
Priority  Create Time
0         2019-01-01     3
          2019-02-01     2
          2019-03-01     1
          2019-04-01     1
1         2019-01-01     1
          2019-03-01     2
Name: Priority, dtype: int64

Upvotes: 2

Related Questions