Francisco Parrilla
Francisco Parrilla

Reputation: 513

How to use pandas Grouper to get sum of values within each hour

I have the following table:

         Hora_Retiro  count_uses
0         00:00:18           1
1         00:00:34           1
2         00:02:27           1
3         00:03:13           1
4         00:06:45           1
...            ...         ...
748700    23:58:47           1
748701    23:58:49           1
748702    23:59:11           1
748703    23:59:47           1
748704    23:59:56           1

And I want to group all values within each hour, so I can see the total number of uses per hour (00:00:00 - 23:00:00)

I have the following code:

hora_pico_aug= hora_pico.groupby(pd.Grouper(key="Hora_Retiro",freq='H')).count()

Hora_Retiro column is of timedelta64[ns] type Which gives the following output:

                count_uses
Hora_Retiro            
00:00:02           2566
01:00:02            602
02:00:02            295
03:00:02              5
04:00:02             10
05:00:02           4002
06:00:02          16075
07:00:02          39410
08:00:02          76272
09:00:02          56721
10:00:02          36036
11:00:02          32011
12:00:02          33725
13:00:02          41032
14:00:02          50747
15:00:02          50338
16:00:02          42347
17:00:02          54674
18:00:02          76056
19:00:02          57958
20:00:02          34286
21:00:02          22509
22:00:02          13894
23:00:02           7134

However, the index column starts at 00:00:02, and I want it to start at 00:00:00, and then go from one hour intervals. Something like this:

                count_uses
Hora_Retiro            
00:00:00           2565
01:00:00            603
02:00:00            295
03:00:00              5
04:00:00             10
05:00:00           4002
06:00:00          16075
07:00:00          39410
08:00:00          76272
09:00:00          56721
10:00:00          36036
11:00:00          32011
12:00:00          33725
13:00:00          41032
14:00:00          50747
15:00:00          50338
16:00:00          42347
17:00:00          54674
18:00:00          76056
19:00:00          57958
20:00:00          34286
21:00:00          22509
22:00:00          13894
23:00:00           7134

How can i make it to start at 00:00:00??

Thanks for the help!

Upvotes: 0

Views: 352

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 30971

I assume that Hora_Retiro column in your DataFrame is of Timedelta type. It is not datetime, as in this case there would be printed also the date part.

Indeed, your code creates groups starting at the minute / second taken from the first row.

To group by "full hours":

  • round each element in this column to hour,
  • then group (just by this rounded value).

The code to do it is:

hora_pico.groupby(hora_pico.Hora_Retiro.apply(
    lambda tt: tt.round('H'))).count_uses.count()

However I advise you to make up your mind, what do you want to count: rows or values in count_uses column. In the second case replace count function with sum.

Upvotes: 0

Abhilash Awasthi
Abhilash Awasthi

Reputation: 797

You can create an hour column from Hora_Retiro column.

df['hour'] = df['Hora_Retiro'].dt.hour

And then groupby on the basis of hour

gpby_df = df.groupby('hour')['count_uses'].sum().reset_index()
gpby_df['hour'] = pd.to_datetime(gpby_df['hour'], format='%H').dt.time
gpby_df.columns = ['Hora_Retiro', 'sum_count_uses']
gpby_df

gives

Hora_Retiro sum_count_uses
0   00:00:00    14
1   09:00:00    1
2   10:00:00    2
3   20:00:00    2

Upvotes: 3

Related Questions