Reputation: 654
I have a dataframe
that has one column and a timestamp index including anywhere from 2 to 7 days:
kWh
Timestamp
2017-07-08 06:00:00 0.00
2017-07-08 07:00:00 752.75
2017-07-08 08:00:00 1390.20
2017-07-08 09:00:00 2027.65
2017-07-08 10:00:00 2447.27
.... ....
2017-07-12 20:00:00 167.64
2017-07-12 21:00:00 0.00
2017-07-12 22:00:00 0.00
2017-07-12 23:00:00 0.00
I would like to transpose the kWh
column so that one day's worth of values (hourly granularity, so 24 values/day) fill up a row. And the next row is the next day of values and so on (so five days of forecasted data has five rows with 24 elements each).
Because my query of the data comes in the vertical format, and my regression and subsequent analysis already occurs in the vertical format, I don't want to change the process too much and am hoping there is a simpler way. I have tried giving a multi-index with df.index.hour
and then using unstack()
, but I get a huge dataframe
with NaN
values everywhere.
Is there an elegant way to do this?
Upvotes: 0
Views: 338
Reputation: 30258
Not sure why your MultiIndex
code doesn't work.
I'm assuming your MultiIndex
code is something along the lines, which gives the same output as the pivot
:
In []
df = pd.DataFrame({"kWh": [1]}, index=pd.date_range("2017-07-08",
"2017-07-12", freq="1H").rename("Timestamp")).cumsum()
df.index = pd.MultiIndex.from_arrays([df.index.date, df.index.hour], names=['Date','Hour'])
df.unstack()
Out[]:
kWh ... \
Hour 0 1 2 3 4 5 6 7 8 9 ...
Date ...
2017-07-08 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...
2017-07-09 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0 ...
2017-07-10 49.0 50.0 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0 ...
2017-07-11 73.0 74.0 75.0 76.0 77.0 78.0 79.0 80.0 81.0 82.0 ...
2017-07-12 97.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
Hour 14 15 16 17 18 19 20 21 22 23
Date
2017-07-08 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0
2017-07-09 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0
2017-07-10 63.0 64.0 65.0 66.0 67.0 68.0 69.0 70.0 71.0 72.0
2017-07-11 87.0 88.0 89.0 90.0 91.0 92.0 93.0 94.0 95.0 96.0
2017-07-12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
[5 rows x 24 columns]
Upvotes: 0
Reputation: 353069
If we start from a frame like
In [25]: df = pd.DataFrame({"kWh": [1]}, index=pd.date_range("2017-07-08",
"2017-07-12", freq="1H").rename("Timestamp")).cumsum()
In [26]: df.head()
Out[26]:
kWh
Timestamp
2017-07-08 00:00:00 1
2017-07-08 01:00:00 2
2017-07-08 02:00:00 3
2017-07-08 03:00:00 4
2017-07-08 04:00:00 5
we can make date and hour columns and then pivot:
In [27]: df["date"] = df.index.date
In [28]: df["hour"] = df.index.hour
In [29]: df.pivot(index="date", columns="hour", values="kWh")
Out[29]:
hour 0 1 2 3 4 5 6 7 8 9 ... \
date ...
2017-07-08 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...
2017-07-09 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0 ...
2017-07-10 49.0 50.0 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0 ...
2017-07-11 73.0 74.0 75.0 76.0 77.0 78.0 79.0 80.0 81.0 82.0 ...
2017-07-12 97.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
hour 14 15 16 17 18 19 20 21 22 23
date
2017-07-08 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0
2017-07-09 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0
2017-07-10 63.0 64.0 65.0 66.0 67.0 68.0 69.0 70.0 71.0 72.0
2017-07-11 87.0 88.0 89.0 90.0 91.0 92.0 93.0 94.0 95.0 96.0
2017-07-12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
[5 rows x 24 columns]
Upvotes: 1