Reputation: 636
I have a Pandas Series of solar radiation values with the index being timestamps with a one minute resolution. E.g.:
index solar_radiation
2019-01-01 08:01 0
2019-01-01 08:02 10
2019-01-01 08:03 15
...
2019-01-10 23:59 0
I would like to convert this to a table (DataFrame) where each hour is averaged into one column, e.g.:
index 00 01 02 03 04 05 06 ... 23
2019-01-01 0 0 0 0 0 3 10 ... 0
2019-01-02 0 0 0 0 0 4 12 ... 0
....
2019-01-10 0 0 0 0 0 6 24... 0
I have tried to look into Groupby, but there I am only able to group hours into one combined bin and not one for each day... any hints or suggestions as to how I can achive this with groupby or should I just brute force it and iterate over each hour?
Upvotes: 1
Views: 242
Reputation: 863256
Solutions for one column DataFrame
:
Aggregate mean
by DatetimeIndex
with DatetimeIndex.floor
for remove times and DatetimeIndex.hour
, reshape by Series.unstack
and add missing values by DataFrame.reindex
:
#if necessary
#df.index = pd.to_datetime(df.index)
rng = pd.date_range(df.index.min().floor('D'), df.index.max().floor('D'))
df1 = (df.groupby([df.index.floor('D'), df.index.hour])['solar_radiation']
.mean()
.unstack(fill_value=0)
.reindex(columns=range(0, 24), fill_value=0, index=rng))
Another solution with Grouper
by hour, replace missing values to 0
and reshape by Series.unstack
:
#if necessary
#df.index = pd.to_datetime(df.index)
df1 = df.groupby(pd.Grouper(freq='H'))[['solar_radiation']].mean().fillna(0)
df1 = df1.set_index([df1.index.date, df1.index.hour])['solar_radiation'].unstack(fill_value=0)
print (df1)
0 1 2 3 4 5 6 7 8 9 ... 14 \
2019-01-01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.333333 0.0 ... 0.0
2019-01-02 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 ... 0.0
2019-01-03 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 ... 0.0
2019-01-04 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 ... 0.0
2019-01-05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 ... 0.0
2019-01-06 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 ... 0.0
2019-01-07 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 ... 0.0
2019-01-08 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 ... 0.0
2019-01-09 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 ... 0.0
2019-01-10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 ... 0.0
15 16 17 18 19 20 21 22 23
2019-01-01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2019-01-02 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2019-01-03 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2019-01-04 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2019-01-05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2019-01-06 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2019-01-07 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2019-01-08 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2019-01-09 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2019-01-10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[10 rows x 24 columns]
Solutions for Series with DatetimeIndex
:
rng = pd.date_range(df.index.min().floor('D'), df.index.max().floor('D'))
df1 = (df.groupby([df.index.floor('D'), df.index.hour])
.mean()
.unstack(fill_value=0)
.reindex(columns=range(0, 24), fill_value=0, index=rng))
df1 = df.groupby(pd.Grouper(freq='H')).mean().to_frame('new').fillna(0)
df1 = df1.set_index([df1.index.date, df1.index.hour])['new'].unstack(fill_value=0)
Upvotes: 1
Reputation: 42916
If I understand you correctly, you want to use resample
hourly. Then we can make a MultiIndex
with date
and hour
, then we unstack the hour
index to columns:
df = df.resample('H').mean()
df.set_index([df.index.date, df.index.time], inplace=True)
df = df.unstack(level=[1])
Which gives us the following output:
print(df)
solar_radiation \
00:00:00 01:00:00 02:00:00 03:00:00 04:00:00 05:00:00
2019-01-01 NaN NaN NaN NaN NaN NaN
2019-01-02 NaN NaN NaN NaN NaN NaN
2019-01-03 NaN NaN NaN NaN NaN NaN
2019-01-04 NaN NaN NaN NaN NaN NaN
2019-01-05 NaN NaN NaN NaN NaN NaN
2019-01-06 NaN NaN NaN NaN NaN NaN
2019-01-07 NaN NaN NaN NaN NaN NaN
2019-01-08 NaN NaN NaN NaN NaN NaN
2019-01-09 NaN NaN NaN NaN NaN NaN
2019-01-10 NaN NaN NaN NaN NaN NaN
... \
06:00:00 07:00:00 08:00:00 09:00:00 ... 14:00:00 15:00:00
2019-01-01 NaN NaN 8.333333 NaN ... NaN NaN
2019-01-02 NaN NaN NaN NaN ... NaN NaN
2019-01-03 NaN NaN NaN NaN ... NaN NaN
2019-01-04 NaN NaN NaN NaN ... NaN NaN
2019-01-05 NaN NaN NaN NaN ... NaN NaN
2019-01-06 NaN NaN NaN NaN ... NaN NaN
2019-01-07 NaN NaN NaN NaN ... NaN NaN
2019-01-08 NaN NaN NaN NaN ... NaN NaN
2019-01-09 NaN NaN NaN NaN ... NaN NaN
2019-01-10 NaN NaN NaN NaN ... NaN NaN
\
16:00:00 17:00:00 18:00:00 19:00:00 20:00:00 21:00:00 22:00:00
2019-01-01 NaN NaN NaN NaN NaN NaN NaN
2019-01-02 NaN NaN NaN NaN NaN NaN NaN
2019-01-03 NaN NaN NaN NaN NaN NaN NaN
2019-01-04 NaN NaN NaN NaN NaN NaN NaN
2019-01-05 NaN NaN NaN NaN NaN NaN NaN
2019-01-06 NaN NaN NaN NaN NaN NaN NaN
2019-01-07 NaN NaN NaN NaN NaN NaN NaN
2019-01-08 NaN NaN NaN NaN NaN NaN NaN
2019-01-09 NaN NaN NaN NaN NaN NaN NaN
2019-01-10 NaN NaN NaN NaN NaN NaN NaN
23:00:00
2019-01-01 NaN
2019-01-02 NaN
2019-01-03 NaN
2019-01-04 NaN
2019-01-05 NaN
2019-01-06 NaN
2019-01-07 NaN
2019-01-08 NaN
2019-01-09 NaN
2019-01-10 0.0
[10 rows x 24 columns]
Note I got a lot of NaN
since you provided only couple of rows data.
Upvotes: 3