diedro
diedro

Reputation: 623

pandas groupby and aggregator

I have the following dataframe:

dates,rr,ax,be
2018-01-01 00:00:00,45.73,47.63,45.83
2018-01-01 01:00:00,44.16,44.42,44.59
2018-01-01 02:00:00,42.24,42.34,40.22
2018-01-01 03:00:00,39.29,38.36,37.31
2018-01-01 04:00:00,36.0,36.87,32.88
2018-01-01 05:00:00,41.99,39.79,39.27
2018-01-01 06:00:00,42.25,42.08,43.62
2018-01-01 07:00:00,44.97,51.19,49.69
2018-01-01 08:00:00,45.0,59.69,49.98
2018-01-01 09:00:00,44.94,56.67,48.04
2018-01-01 10:00:00,45.04,53.54,46.85
2018-01-01 11:00:00,46.67,52.6,47.95
2018-01-01 12:00:00,46.99,50.77,46.6
2018-01-01 13:00:00,44.16,50.27,43.02
2018-01-01 14:00:00,45.26,50.64,44.2
2018-01-01 15:00:00,47.84,54.79,47.1
2018-01-01 16:00:00,50.1,60.17,50.83
2018-01-01 17:00:00,54.3,59.47,58.31
2018-01-01 18:00:00,51.91,60.16,63.5
2018-01-01 19:00:00,51.38,70.81,61.9
2018-01-01 20:00:00,49.2,62.65,59.62
2018-01-01 21:00:00,45.73,59.71,52.84
2018-01-01 22:00:00,44.84,50.96,51.43
2018-01-01 23:00:00,38.11,46.52,45.35
2018-01-02 00:00:00,19.19,49.62,41.61
2018-01-02 01:00:00,14.99,45.05,40.78
2018-01-02 02:00:00,11.0,45.18,39.59

then I group according to the hour in the index as follows:

df1 = df.assign(hour=lambda x: x.index.hour).groupby('hour')

at this point, I would like to compute the mean of each group. I do as follow:

aa = df1['hour'].mean()

This seems to not working. What am I missing? Thanks, Diego

Upvotes: 0

Views: 42

Answers (2)

jezrael
jezrael

Reputation: 863166

Solution looks good, should by simplify:

aa = df.groupby(df.index.hour.rename('hour')).mean()
print (aa)
          rr      ax      be
hour                        
0     32.460  48.625  43.720
1     29.575  44.735  42.685
2     26.620  43.760  39.905
3     39.290  38.360  37.310
4     36.000  36.870  32.880
5     41.990  39.790  39.270
6     42.250  42.080  43.620
7     44.970  51.190  49.690
8     45.000  59.690  49.980
9     44.940  56.670  48.040
10    45.040  53.540  46.850
11    46.670  52.600  47.950
12    46.990  50.770  46.600
13    44.160  50.270  43.020
14    45.260  50.640  44.200
15    47.840  54.790  47.100
16    50.100  60.170  50.830
17    54.300  59.470  58.310
18    51.910  60.160  63.500
19    51.380  70.810  61.900
20    49.200  62.650  59.620
21    45.730  59.710  52.840
22    44.840  50.960  51.430
23    38.110  46.520  45.350
    

In your solution:

df1 = df.assign(hour=lambda x: x.index.hour).groupby('hour')
#removed ['hour'] for get means of all numeric columns
aa = df1.mean()

Upvotes: 3

IoaTzimas
IoaTzimas

Reputation: 10624

Your dates column is not your index and it's also a string column. You can try the following:

df['hour']=df.dates.apply(lambda x: x[11:13])

df1=df.groupby('hour').mean()

del df['hour']

Output:

>>> print(df1)

          rr      ax      be
hour
00    32.460  48.625  43.720
01    29.575  44.735  42.685
02    26.620  43.760  39.905
03    39.290  38.360  37.310
04    36.000  36.870  32.880
05    41.990  39.790  39.270
06    42.250  42.080  43.620
07    44.970  51.190  49.690
08    45.000  59.690  49.980
09    44.940  56.670  48.040
10    45.040  53.540  46.850
11    46.670  52.600  47.950
12    46.990  50.770  46.600
13    44.160  50.270  43.020
14    45.260  50.640  44.200
15    47.840  54.790  47.100
16    50.100  60.170  50.830
17    54.300  59.470  58.310
18    51.910  60.160  63.500
19    51.380  70.810  61.900
20    49.200  62.650  59.620
21    45.730  59.710  52.840
22    44.840  50.960  51.430
23    38.110  46.520  45.350

Upvotes: 0

Related Questions