Reputation: 13
I'm new to python and pandas and looking for the most efficient way to turn a dataframe like this:
node_id | unit | start_dt | 00:00:00 | 00:15:00 | 00:30:00 | 00:45:00 | 01:00:00 |
---|---|---|---|---|---|---|---|
50024 | temp | 2021-02-28 05:30:00 | 22.2 | 21.7 | 22.5 | 22.4 | 22.3 |
50024 | volt | 2021-02-28 05:30:00 | 122.2 | 122.8 | 123.5 | 124.1 | 122.6 |
51849 | temp | 2021-02-28 05:30:00 | 14.1 | 14.3 | 14.5 | 14.3 | 14.2 |
51849 | volt | 2021-02-28 05:30:00 | 124.1 | 124.5 | 125.1 | 123.9 | 125.2 |
Into something like this:
time | node_id | unit | value |
---|---|---|---|
2021-02-28 05:30:00 | 50024 | temp | 22.2 |
2021-02-28 05:45:00 | 50024 | temp | 21.7 |
2021-02-28 06:00:00 | 50024 | temp | 22.5 |
2021-02-28 06:15:00 | 50024 | temp | 22.4 |
2021-02-28 06:30:00 | 50024 | temp | 22.3 |
2021-02-28 05:30:00 | 50024 | volt | 122.2 |
2021-02-28 05:45:00 | 50024 | volt | 122.8 |
2021-02-28 06:00:00 | 50024 | volt | 123.5 |
2021-02-28 06:15:00 | 50024 | volt | 124.1 |
2021-02-28 06:30:00 | 50024 | volt | 122.6 |
2021-02-28 05:30:00 | 51849 | temp | 14.1 |
2021-02-28 05:45:00 | 51849 | temp | 14.3 |
2021-02-28 06:00:00 | 51849 | temp | 14.5 |
2021-02-28 06:15:00 | 51849 | temp | 14.3 |
2021-02-28 06:30:00 | 51849 | temp | 14.2 |
2021-02-28 06:30:00 | 51849 | volt | 124.1 |
2021-02-28 06:30:00 | 51849 | volt | 124.5 |
2021-02-28 06:30:00 | 51849 | volt | 125.1 |
2021-02-28 06:30:00 | 51849 | volt | 123.9 |
2021-02-28 06:30:00 | 51849 | volt | 125.2 |
Ultimately, I would like to take the resulting dataframe and load it into a database like TimescaleDB to learn more about that. Any help / guidance on this would be appreciated.
Upvotes: 1
Views: 52
Reputation: 862511
Use DataFrame.set_index
with DataFrame.stack
and then add timedeltas created by to_timedelta
to datetimes with DataFrame.pop
for use and then drop column level_3
:
df = df.set_index(['start_dt','node_id','unit']).stack().reset_index(name='value')
df['start_dt'] = pd.to_datetime(df['start_dt']) + pd.to_timedelta(df.pop('level_3'))
print (df)
start_dt node_id unit value
0 2021-02-28 05:30:00 50024 temp 22.2
1 2021-02-28 05:45:00 50024 temp 21.7
2 2021-02-28 06:00:00 50024 temp 22.5
3 2021-02-28 06:15:00 50024 temp 22.4
4 2021-02-28 06:30:00 50024 temp 22.3
5 2021-02-28 05:30:00 50024 volt 122.2
6 2021-02-28 05:45:00 50024 volt 122.8
7 2021-02-28 06:00:00 50024 volt 123.5
8 2021-02-28 06:15:00 50024 volt 124.1
9 2021-02-28 06:30:00 50024 volt 122.6
10 2021-02-28 05:30:00 51849 temp 14.1
11 2021-02-28 05:45:00 51849 temp 14.3
12 2021-02-28 06:00:00 51849 temp 14.5
13 2021-02-28 06:15:00 51849 temp 14.3
14 2021-02-28 06:30:00 51849 temp 14.2
15 2021-02-28 05:30:00 51849 volt 124.1
16 2021-02-28 05:45:00 51849 volt 124.5
17 2021-02-28 06:00:00 51849 volt 125.1
18 2021-02-28 06:15:00 51849 volt 123.9
19 2021-02-28 06:30:00 51849 volt 125.2
Or use DataFrame.melt
alternative:
df = df.melt(['start_dt','node_id','unit'])
df['start_dt'] = pd.to_datetime(df['start_dt']) + pd.to_timedelta(df.pop('variable'))
print (df)
start_dt node_id unit value
0 2021-02-28 05:30:00 50024 temp 22.2
1 2021-02-28 05:30:00 50024 volt 122.2
2 2021-02-28 05:30:00 51849 temp 14.1
3 2021-02-28 05:30:00 51849 volt 124.1
4 2021-02-28 05:45:00 50024 temp 21.7
5 2021-02-28 05:45:00 50024 volt 122.8
6 2021-02-28 05:45:00 51849 temp 14.3
7 2021-02-28 05:45:00 51849 volt 124.5
8 2021-02-28 06:00:00 50024 temp 22.5
9 2021-02-28 06:00:00 50024 volt 123.5
10 2021-02-28 06:00:00 51849 temp 14.5
11 2021-02-28 06:00:00 51849 volt 125.1
12 2021-02-28 06:15:00 50024 temp 22.4
13 2021-02-28 06:15:00 50024 volt 124.1
14 2021-02-28 06:15:00 51849 temp 14.3
15 2021-02-28 06:15:00 51849 volt 123.9
16 2021-02-28 06:30:00 50024 temp 22.3
17 2021-02-28 06:30:00 50024 volt 122.6
18 2021-02-28 06:30:00 51849 temp 14.2
19 2021-02-28 06:30:00 51849 volt 125.2
Upvotes: 1