Timeseries dataset to hourly feature dataset in Pandas

Question

I have a pandas data frame of a time-series data like this

            Timestamp       X
0   2016-12-01 00:00:00 0.186090
1   2016-12-01 00:10:00 0.203160
2   2016-12-01 00:20:00 0.216228
3   2016-12-01 00:30:00 0.220723
4   2016-12-01 00:40:00 0.263620
5   2016-12-01 00:50:00 0.287217
6   2016-12-01 01:00:00 0.282319
7   2016-12-01 01:10:00 0.242778
8   2016-12-01 01:20:00 0.235190
9   2016-12-01 01:30:00 0.210077
10  2016-12-01 01:40:00 0.251426
11  2016-12-01 01:50:00 0.238118
12  2016-12-01 02:00:00 0.262105
13  2016-12-01 02:10:00 0.270865
14  2016-12-01 02:20:00 0.281123
15  2016-12-01 02:30:00 0.276698
16  2016-12-01 02:40:00 0.296046
17  2016-12-01 02:50:00 0.308164
18  2016-12-01 03:00:00 0.313092
19  2016-12-01 03:10:00 0.233784

I want to convert the dataset into something like this

Date          F1     F2        F3        F4      F5        F6       .... F145
2016-12-01 0.186090  0.203160  0.216228  0.20723 0.263620  0.287217 .........
2016-12-02 ..................................................................

ie, I want to make another data frame with 145 columns each denoting a particular time block of the day. F1 denotes 00:00:00, F2 denotes 00:10:00 ..... F144 denotes 23:50:00 and F155 denotes 00:00:00 of the next day.

What is the most efficient way of achieving this in pandas.?

Pivoting can be done for these kinds of tasks but how to use pivoting with a timestamp column.?

jezrael · Accepted Answer

First remove times by floor - get datetimes or date - get python object dates, create column by time and pivot:

df['Timestamp'] = pd.to_datetime(df['Timestamp']) 
df['Date'] = df['Timestamp'].dt.floor('D')
df['Hours'] = df['Timestamp'].dt.time

df = df.pivot('Date','Hours','X')
print (df)
Hours       00:00:00  00:10:00  00:20:00  00:30:00  00:40:00  00:50:00  \
Date                                                                     
2016-12-01   0.18609   0.20316  0.216228  0.220723   0.26362  0.287217   

Hours       01:00:00  01:10:00  01:20:00  01:30:00  01:40:00  01:50:00  \
Date                                                                     
2016-12-01  0.282319  0.242778   0.23519  0.210077  0.251426  0.238118   

Hours       02:00:00  02:10:00  02:20:00  02:30:00  02:40:00  02:50:00  \
Date                                                                     
2016-12-01  0.262105  0.270865  0.281123  0.276698  0.296046  0.308164   

Hours       03:00:00  03:10:00  
Date                            
2016-12-01  0.313092  0.233784

Last convert columns to Counter and set Date to column:

df.columns = [f'F{x+1}' for x in range(len(df.columns))]
df = df.reset_index().rename_axis(None, axis=1)
print (df)
        Date       F1       F2        F3        F4       F5        F6  \
0 2016-12-01  0.18609  0.20316  0.216228  0.220723  0.26362  0.287217   

         F7        F8       F9  ...       F11       F12       F13       F14  \
0  0.282319  0.242778  0.23519  ...  0.251426  0.238118  0.262105  0.270865   

        F15       F16       F17       F18       F19       F20  
0  0.281123  0.276698  0.296046  0.308164  0.313092  0.233784  

[1 rows x 21 columns]

Last use shift for last column:

df['F145'] = df['F1'].shift(-1)

Timeseries dataset to hourly feature dataset in Pandas

Answers (1)

Related Questions