Gonçalo Peres
Gonçalo Peres

Reputation: 13582

DataFrame groupby hour and do the mean for a specific column

Considering the following DataFrame

    Year    Month   Day Hour    1   2   4   5   6   7   Solar
0   2019    01  01  00  3856    6074    2123    3634    2219    2449    29
1   2019    01  01  00  3856    6072    2038    3443    2376    2644    29
2   2019    01  01  00  3862    6074    1916    3341    2734    2522    29
3   2019    01  01  00  3815    6074    1882    3135    2880    2556    29
4   2019    01  01  00  3751    6073    1855    3055    2940    2651    30
5   2019    01  01  00  3763    6071    1844    2978    2907    2628    29
6   2019    01  01  01  3808    6072    1842    2898    2868    2557    29
7   2019    01  01  01  3799    6074    1743    3559    2838    1844    29
8   2019    01  01  01  3810    6073    1688    3305    2766    1958    29
9   2019    01  01  01  3798    6075    1696    3142    2645    2048    30
10  2019    01  01  01  3740    6072    1678    3096    2598    2056    29

In order to get the mean of "Solar" (the 11th column) per Hour (the 3rd column), I have tried

1.

df['Solar_Mean'] = df.groupby(['Hour'])['Solar'].mean()

"Solar_Mean" will get only nan

    Solar_Mean
0   nan
1   nan
2   nan
3   nan
4   nan
5   nan
df['Solar_Mean'] = df.groupby(['Hour'])['Solar'].transform('mean')

Which gives

    Solar_Mean
0   272.4290164663996
1   272.4290164663996
2   272.4290164663996
3   272.4290164663996
4   272.4290164663996
5   272.4290164663996
df['Solar_Mean'] = df.groupby(['Hour'])['Solar'].transform(np.mean)

Which gives me the same as in the 2nd approach.

    Solar_Mean
0   272.4290164663996
1   272.4290164663996
2   272.4290164663996
3   272.4290164663996
4   272.4290164663996
5   272.4290164663996

As there are 6 files per hour, if one takes the sum of the first 6 files and divide them by 6, one gets 29.1666666667 which should be the right value. What am I missing here?

Upvotes: 1

Views: 151

Answers (1)

Gonçalo Peres
Gonçalo Peres

Reputation: 13582

I was not considering the Year, Month and Day when using the groupby. It should've been like this

df['Solar_Mean'] = df.groupby(['Year', 'Month', 'Day', 'Hour'])['Solar'].transform('mean')

Which gives

    Solar_Mean
0   29.166666666666668
1   29.166666666666668
2   29.166666666666668
3   29.166666666666668
4   29.166666666666668
5   29.166666666666668

Upvotes: 1

Related Questions