Rivers31334
Rivers31334

Reputation: 654

Transposing dataframe column, creating different rows per day

I have a dataframe that has one column and a timestamp index including anywhere from 2 to 7 days:

                         kWh
Timestamp                   
2017-07-08 06:00:00     0.00
2017-07-08 07:00:00   752.75
2017-07-08 08:00:00  1390.20
2017-07-08 09:00:00  2027.65
2017-07-08 10:00:00  2447.27    
        ....           ....
2017-07-12 20:00:00   167.64
2017-07-12 21:00:00     0.00
2017-07-12 22:00:00     0.00
2017-07-12 23:00:00     0.00

I would like to transpose the kWh column so that one day's worth of values (hourly granularity, so 24 values/day) fill up a row. And the next row is the next day of values and so on (so five days of forecasted data has five rows with 24 elements each).

Because my query of the data comes in the vertical format, and my regression and subsequent analysis already occurs in the vertical format, I don't want to change the process too much and am hoping there is a simpler way. I have tried giving a multi-index with df.index.hour and then using unstack(), but I get a huge dataframe with NaN values everywhere.

Is there an elegant way to do this?

Upvotes: 0

Views: 338

Answers (2)

AChampion
AChampion

Reputation: 30258

Not sure why your MultiIndex code doesn't work. I'm assuming your MultiIndex code is something along the lines, which gives the same output as the pivot:

In []
df = pd.DataFrame({"kWh": [1]}, index=pd.date_range("2017-07-08", 
                   "2017-07-12", freq="1H").rename("Timestamp")).cumsum()
df.index = pd.MultiIndex.from_arrays([df.index.date, df.index.hour], names=['Date','Hour'])
df.unstack()

Out[]:
             kWh                                                        ...   \
Hour          0     1     2     3     4     5     6     7     8     9   ...    
Date                                                                    ...    
2017-07-08   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  ...    
2017-07-09  25.0  26.0  27.0  28.0  29.0  30.0  31.0  32.0  33.0  34.0  ...    
2017-07-10  49.0  50.0  51.0  52.0  53.0  54.0  55.0  56.0  57.0  58.0  ...    
2017-07-11  73.0  74.0  75.0  76.0  77.0  78.0  79.0  80.0  81.0  82.0  ...    
2017-07-12  97.0   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...    


Hour          14    15    16    17    18    19    20    21    22    23  
Date                                                                    
2017-07-08  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  
2017-07-09  39.0  40.0  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  
2017-07-10  63.0  64.0  65.0  66.0  67.0  68.0  69.0  70.0  71.0  72.0  
2017-07-11  87.0  88.0  89.0  90.0  91.0  92.0  93.0  94.0  95.0  96.0  
2017-07-12   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  

[5 rows x 24 columns]

Upvotes: 0

DSM
DSM

Reputation: 353069

If we start from a frame like

In [25]: df = pd.DataFrame({"kWh": [1]}, index=pd.date_range("2017-07-08", 
             "2017-07-12", freq="1H").rename("Timestamp")).cumsum()

In [26]: df.head()
Out[26]: 
                     kWh
Timestamp               
2017-07-08 00:00:00    1
2017-07-08 01:00:00    2
2017-07-08 02:00:00    3
2017-07-08 03:00:00    4
2017-07-08 04:00:00    5

we can make date and hour columns and then pivot:

In [27]: df["date"] = df.index.date

In [28]: df["hour"] = df.index.hour

In [29]: df.pivot(index="date", columns="hour", values="kWh")
Out[29]: 
hour          0     1     2     3     4     5     6     7     8     9   ...   \
date                                                                    ...    
2017-07-08   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  ...    
2017-07-09  25.0  26.0  27.0  28.0  29.0  30.0  31.0  32.0  33.0  34.0  ...    
2017-07-10  49.0  50.0  51.0  52.0  53.0  54.0  55.0  56.0  57.0  58.0  ...    
2017-07-11  73.0  74.0  75.0  76.0  77.0  78.0  79.0  80.0  81.0  82.0  ...    
2017-07-12  97.0   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...    

hour          14    15    16    17    18    19    20    21    22    23  
date                                                                    
2017-07-08  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  
2017-07-09  39.0  40.0  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  
2017-07-10  63.0  64.0  65.0  66.0  67.0  68.0  69.0  70.0  71.0  72.0  
2017-07-11  87.0  88.0  89.0  90.0  91.0  92.0  93.0  94.0  95.0  96.0  
2017-07-12   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  

[5 rows x 24 columns]

Upvotes: 1

Related Questions