Yannick
Yannick

Reputation: 419

Group Pandas data by hour of the day

I used the Following code to generate random date and values:

import pandas as pd
import numpy as np

time = pd.date_range('1/1/2000', periods=2000, freq='5min')

series = pd.Series(np.random.randint(100, size=2000), index=time)

the output looks like this:

2000-01-01 00:00:00    40
2000-01-01 00:05:00    13
2000-01-01 00:10:00    99
2000-01-01 00:15:00    72
2000-01-01 00:20:00     4
2000-01-01 00:25:00    36
2000-01-01 00:30:00    24
2000-01-01 00:35:00    20
2000-01-01 00:40:00    83
2000-01-01 00:45:00    44

Then I sort this data by the index hour value then aggregate it by the average which looks like this:

0     50.380952
1     49.380952
2     49.904762
3     53.273810
4     47.178571
5     46.095238
6     49.047619
7     44.297619
8     53.119048
9     48.261905
10    45.166667
11    54.214286
12    50.714286
13    56.130952
14    50.916667
15    42.428571
16    46.880952
17    56.892857
18    54.071429
19    47.607143
20    50.940476
21    50.511905
22    44.550000
23    50.250000

But what should I do now if I want to group all the data just by the index hour value and without the average so that I can get all the values associated with their index?

Thanks in advance.

Regards,

Upvotes: 3

Views: 4761

Answers (1)

jezrael
jezrael

Reputation: 863541

If want aggregate by hours with means:

np.random.seed(456)
time = pd.date_range('1/1/2000', periods=2000, freq='5min')
series = pd.Series(np.random.randint(100, size=2000), index=time)

s = series.groupby(series.index.hour).mean()
print (s)
0     49.392857
1     52.523810
2     53.047619
3     49.083333
4     49.785714
5     49.071429
6     52.476190
7     47.821429
8     52.190476
9     50.000000
10    49.035714
11    52.988095
12    52.785714
13    52.023810
14    46.964286
15    52.095238
16    51.047619
17    52.166667
18    48.357143
19    51.416667
20    45.214286
21    46.130952
22    49.750000
23    48.527778
dtype: float64

But if need MultiIndex by hours:

series.index = [series.index.hour, series.index]
print (series)
0   2000-01-01 00:00:00    27
    2000-01-01 00:05:00    43
    2000-01-01 00:10:00    89
    2000-01-01 00:15:00    42
    2000-01-01 00:20:00    28
    2000-01-01 00:25:00    79
    2000-01-01 00:30:00    60
    2000-01-01 00:35:00    45
    2000-01-01 00:40:00    37
    2000-01-01 00:45:00    92
    2000-01-01 00:50:00    39
    2000-01-01 00:55:00    81
1   2000-01-01 01:00:00    11
    2000-01-01 01:05:00    77
    2000-01-01 01:10:00    69
    2000-01-01 01:15:00    98

...

Then is possible select by hours:

print (series.loc[0])
2000-01-01 00:00:00    27
2000-01-01 00:05:00    43
2000-01-01 00:10:00    89
2000-01-01 00:15:00    42
2000-01-01 00:20:00    28
2000-01-01 00:25:00    79
2000-01-01 00:30:00    60
2000-01-01 00:35:00    45
2000-01-01 00:40:00    37
2000-01-01 00:45:00    92
2000-01-01 00:50:00    39
2000-01-01 00:55:00    81
2000-01-02 00:00:00    82
2000-01-02 00:05:00    69
2000-01-02 00:10:00    99
2000-01-02 00:15:00    17
2000-01-02 00:20:00    59
...

Also if need means with no change DatetimeIndex:

s1 = series.groupby(series.index.hour).transform('mean')
print (s1)
2000-01-01 00:00:00    49.392857
2000-01-01 00:05:00    49.392857
2000-01-01 00:10:00    49.392857
2000-01-01 00:15:00    49.392857
2000-01-01 00:20:00    49.392857
2000-01-01 00:25:00    49.392857
2000-01-01 00:30:00    49.392857
2000-01-01 00:35:00    49.392857
2000-01-01 00:40:00    49.392857
2000-01-01 00:45:00    49.392857
2000-01-01 00:50:00    49.392857
2000-01-01 00:55:00    49.392857
2000-01-01 01:00:00    52.523810
2000-01-01 01:05:00    52.523810
2000-01-01 01:10:00    52.523810
2000-01-01 01:15:00    52.523810
2000-01-01 01:20:00    52.523810
2000-01-01 01:25:00    52.523810
2000-01-01 01:30:00    52.523810
...

EDIT:

For list per hours use:

s = series.groupby(series.index.hour).apply(list)
print (s)
0     [27, 43, 89, 42, 28, 79, 60, 45, 37, 92, 39, 8...
1     [11, 77, 69, 98, 78, 84, 34, 66, 4, 8, 85, 62,...
2     [16, 41, 10, 72, 44, 35, 48, 51, 99, 53, 22, 3...
3     [56, 22, 74, 85, 81, 6, 44, 44, 49, 43, 95, 11...
4     [21, 90, 89, 76, 62, 20, 66, 50, 68, 79, 69, 4...
5     [51, 85, 31, 58, 97, 10, 91, 25, 4, 11, 94, 28...
6     [5, 71, 62, 57, 62, 87, 12, 41, 43, 47, 25, 15...
7     [84, 17, 26, 32, 14, 76, 72, 35, 8, 60, 79, 27...
8     [15, 30, 80, 53, 10, 97, 71, 83, 37, 44, 89, 1...
9     [58, 20, 98, 77, 75, 26, 63, 26, 24, 62, 93, 6...
10    [39, 61, 92, 43, 61, 73, 86, 64, 26, 0, 75, 11...
11    [24, 13, 13, 54, 50, 38, 22, 46, 67, 15, 29, 4...
12    [21, 56, 16, 63, 46, 79, 11, 85, 87, 18, 66, 9...
13    [10, 89, 66, 80, 60, 2, 6, 19, 77, 81, 38, 48,...
14    [17, 64, 90, 91, 71, 32, 77, 9, 76, 14, 9, 79,...
15    [95, 75, 49, 34, 5, 31, 43, 68, 84, 48, 25, 69...
16    [13, 68, 87, 96, 6, 83, 9, 5, 29, 93, 57, 92, ...
17    [77, 6, 73, 41, 76, 93, 11, 50, 72, 84, 82, 53...
18    [95, 11, 61, 56, 30, 24, 24, 9, 0, 65, 96, 82,...
19    [31, 14, 98, 67, 7, 54, 29, 60, 77, 83, 45, 70...
20    [4, 15, 37, 78, 79, 59, 63, 97, 14, 74, 33, 2,...
21    [88, 69, 31, 20, 41, 10, 41, 6, 36, 27, 63, 49...
22    [4, 90, 70, 66, 92, 46, 54, 47, 6, 54, 62, 80,...
23    [27, 23, 21, 18, 29, 39, 77, 88, 21, 86, 7, 45...
dtype: object

Upvotes: 3

Related Questions