DustyBones
DustyBones

Reputation: 13

Converting dataframe into time series

I'm new to python and pandas and looking for the most efficient way to turn a dataframe like this:

node_id unit start_dt 00:00:00 00:15:00 00:30:00 00:45:00 01:00:00
50024 temp 2021-02-28 05:30:00 22.2 21.7 22.5 22.4 22.3
50024 volt 2021-02-28 05:30:00 122.2 122.8 123.5 124.1 122.6
51849 temp 2021-02-28 05:30:00 14.1 14.3 14.5 14.3 14.2
51849 volt 2021-02-28 05:30:00 124.1 124.5 125.1 123.9 125.2

Into something like this:

time node_id unit value
2021-02-28 05:30:00 50024 temp 22.2
2021-02-28 05:45:00 50024 temp 21.7
2021-02-28 06:00:00 50024 temp 22.5
2021-02-28 06:15:00 50024 temp 22.4
2021-02-28 06:30:00 50024 temp 22.3
2021-02-28 05:30:00 50024 volt 122.2
2021-02-28 05:45:00 50024 volt 122.8
2021-02-28 06:00:00 50024 volt 123.5
2021-02-28 06:15:00 50024 volt 124.1
2021-02-28 06:30:00 50024 volt 122.6
2021-02-28 05:30:00 51849 temp 14.1
2021-02-28 05:45:00 51849 temp 14.3
2021-02-28 06:00:00 51849 temp 14.5
2021-02-28 06:15:00 51849 temp 14.3
2021-02-28 06:30:00 51849 temp 14.2
2021-02-28 06:30:00 51849 volt 124.1
2021-02-28 06:30:00 51849 volt 124.5
2021-02-28 06:30:00 51849 volt 125.1
2021-02-28 06:30:00 51849 volt 123.9
2021-02-28 06:30:00 51849 volt 125.2

Ultimately, I would like to take the resulting dataframe and load it into a database like TimescaleDB to learn more about that. Any help / guidance on this would be appreciated.

Upvotes: 1

Views: 52

Answers (1)

jezrael
jezrael

Reputation: 862511

Use DataFrame.set_index with DataFrame.stack and then add timedeltas created by to_timedelta to datetimes with DataFrame.pop for use and then drop column level_3:

df = df.set_index(['start_dt','node_id','unit']).stack().reset_index(name='value')

df['start_dt'] = pd.to_datetime(df['start_dt']) + pd.to_timedelta(df.pop('level_3'))
print (df)
              start_dt  node_id  unit  value
0  2021-02-28 05:30:00    50024  temp   22.2
1  2021-02-28 05:45:00    50024  temp   21.7
2  2021-02-28 06:00:00    50024  temp   22.5
3  2021-02-28 06:15:00    50024  temp   22.4
4  2021-02-28 06:30:00    50024  temp   22.3
5  2021-02-28 05:30:00    50024  volt  122.2
6  2021-02-28 05:45:00    50024  volt  122.8
7  2021-02-28 06:00:00    50024  volt  123.5
8  2021-02-28 06:15:00    50024  volt  124.1
9  2021-02-28 06:30:00    50024  volt  122.6
10 2021-02-28 05:30:00    51849  temp   14.1
11 2021-02-28 05:45:00    51849  temp   14.3
12 2021-02-28 06:00:00    51849  temp   14.5
13 2021-02-28 06:15:00    51849  temp   14.3
14 2021-02-28 06:30:00    51849  temp   14.2
15 2021-02-28 05:30:00    51849  volt  124.1
16 2021-02-28 05:45:00    51849  volt  124.5
17 2021-02-28 06:00:00    51849  volt  125.1
18 2021-02-28 06:15:00    51849  volt  123.9
19 2021-02-28 06:30:00    51849  volt  125.2

Or use DataFrame.melt alternative:

df = df.melt(['start_dt','node_id','unit'])

df['start_dt'] = pd.to_datetime(df['start_dt']) + pd.to_timedelta(df.pop('variable'))
print (df)
              start_dt  node_id  unit  value
0  2021-02-28 05:30:00    50024  temp   22.2
1  2021-02-28 05:30:00    50024  volt  122.2
2  2021-02-28 05:30:00    51849  temp   14.1
3  2021-02-28 05:30:00    51849  volt  124.1
4  2021-02-28 05:45:00    50024  temp   21.7
5  2021-02-28 05:45:00    50024  volt  122.8
6  2021-02-28 05:45:00    51849  temp   14.3
7  2021-02-28 05:45:00    51849  volt  124.5
8  2021-02-28 06:00:00    50024  temp   22.5
9  2021-02-28 06:00:00    50024  volt  123.5
10 2021-02-28 06:00:00    51849  temp   14.5
11 2021-02-28 06:00:00    51849  volt  125.1
12 2021-02-28 06:15:00    50024  temp   22.4
13 2021-02-28 06:15:00    50024  volt  124.1
14 2021-02-28 06:15:00    51849  temp   14.3
15 2021-02-28 06:15:00    51849  volt  123.9
16 2021-02-28 06:30:00    50024  temp   22.3
17 2021-02-28 06:30:00    50024  volt  122.6
18 2021-02-28 06:30:00    51849  temp   14.2
19 2021-02-28 06:30:00    51849  volt  125.2

Upvotes: 1

Related Questions