Reputation: 13582
Considering the following DataFrame
Year Month Day Hour 1 2 4 5 6 7 Solar
0 2019 01 01 00 3856 6074 2123 3634 2219 2449 29
1 2019 01 01 00 3856 6072 2038 3443 2376 2644 29
2 2019 01 01 00 3862 6074 1916 3341 2734 2522 29
3 2019 01 01 00 3815 6074 1882 3135 2880 2556 29
4 2019 01 01 00 3751 6073 1855 3055 2940 2651 30
5 2019 01 01 00 3763 6071 1844 2978 2907 2628 29
6 2019 01 01 01 3808 6072 1842 2898 2868 2557 29
7 2019 01 01 01 3799 6074 1743 3559 2838 1844 29
8 2019 01 01 01 3810 6073 1688 3305 2766 1958 29
9 2019 01 01 01 3798 6075 1696 3142 2645 2048 30
10 2019 01 01 01 3740 6072 1678 3096 2598 2056 29
In order to get the mean of "Solar" (the 11th column) per Hour (the 3rd column), I have tried
1.
df['Solar_Mean'] = df.groupby(['Hour'])['Solar'].mean()
"Solar_Mean" will get only nan
Solar_Mean
0 nan
1 nan
2 nan
3 nan
4 nan
5 nan
df['Solar_Mean'] = df.groupby(['Hour'])['Solar'].transform('mean')
Which gives
Solar_Mean
0 272.4290164663996
1 272.4290164663996
2 272.4290164663996
3 272.4290164663996
4 272.4290164663996
5 272.4290164663996
df['Solar_Mean'] = df.groupby(['Hour'])['Solar'].transform(np.mean)
Which gives me the same as in the 2nd approach.
Solar_Mean
0 272.4290164663996
1 272.4290164663996
2 272.4290164663996
3 272.4290164663996
4 272.4290164663996
5 272.4290164663996
As there are 6 files per hour, if one takes the sum of the first 6 files and divide them by 6, one gets 29.1666666667
which should be the right value. What am I missing here?
Upvotes: 1
Views: 151
Reputation: 13582
I was not considering the Year
, Month
and Day
when using the groupby
. It should've been like this
df['Solar_Mean'] = df.groupby(['Year', 'Month', 'Day', 'Hour'])['Solar'].transform('mean')
Which gives
Solar_Mean
0 29.166666666666668
1 29.166666666666668
2 29.166666666666668
3 29.166666666666668
4 29.166666666666668
5 29.166666666666668
Upvotes: 1