Reputation: 1045
The below code is not working I am trying to use my timestamp data, aggregate it in 10 minute intervals and do time series analysis in pandas, I am getting a value error on the last command and I dont know how to remediate
0 2016-01-01 00:11:52
1 2016-01-01 00:13:00
2 2016-01-01 00:14:49
3 2016-01-01 00:21:00
4 2016-01-01 00:23:05
5 2016-01-01 00:29:00
6 2016-01-01 00:30:00
7 2016-01-01 00:30:36
8 2016-01-01 00:32:00
9 2016-01-01 00:33:00
10 2016-01-01 00:36:40
11 2016-01-01 00:36:55
12 2016-01-01 00:41:00
13 2016-01-01 00:48:17
14 2016-01-01 00:50:49
15 2016-01-01 00:51:00
16 2016-01-01 00:53:00
17 2016-01-01 00:56:00
18 2016-01-01 00:57:00
19 2016-01-01 00:57:35
20 2016-01-01 01:01:00
21 2016-01-01 01:01:37
22 2016-01-01 01:02:07
23 2016-01-01 01:08:32
24 2016-01-01 01:09:00
25 2016-01-01 01:16:00
26 2016-01-01 01:18:47
27 2016-01-01 01:21:00
28 2016-01-01 01:27:34
29 2016-01-01 01:29:07
...
values = series.values
values = values.reshape((len(values), 1))
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(values)
Upvotes: 0
Views: 2578
Reputation: 294348
Assuming dtype
is Timestamp
... if not, do this first
series = pd.to_datetime(series)
Problem is you need to turn these into numbers that make sense for scaling.
I'd subtract the minimum date from the series to get a series of Timedelta
s. Then find the total seconds for each Timedelta
which is a float.
Now you're ready for MinMaxScaler
values = series.sub(series.min()).dt.total_seconds().values
values = values.reshape((len(values), 1))
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(values)
setup for debugging
from io import StringIO
import pandas as pd
txt = """0 2016-01-01 00:11:52
1 2016-01-01 00:13:00
2 2016-01-01 00:14:49
3 2016-01-01 00:21:00
4 2016-01-01 00:23:05
5 2016-01-01 00:29:00
6 2016-01-01 00:30:00
7 2016-01-01 00:30:36
8 2016-01-01 00:32:00
9 2016-01-01 00:33:00
10 2016-01-01 00:36:40
11 2016-01-01 00:36:55
12 2016-01-01 00:41:00
13 2016-01-01 00:48:17
14 2016-01-01 00:50:49
15 2016-01-01 00:51:00
16 2016-01-01 00:53:00
17 2016-01-01 00:56:00
18 2016-01-01 00:57:00
19 2016-01-01 00:57:35
20 2016-01-01 01:01:00
21 2016-01-01 01:01:37
22 2016-01-01 01:02:07
23 2016-01-01 01:08:32
24 2016-01-01 01:09:00
25 2016-01-01 01:16:00
26 2016-01-01 01:18:47
27 2016-01-01 01:21:00
28 2016-01-01 01:27:34
29 2016-01-01 01:29:07"""
series = pd.read_csv(StringIO(txt),
sep='\s{2,}', header=None,
index_col=0, squeeze=True,
engine='python').rename_axis(None)
series = pd.to_datetime(series)
series.sub(series.min()).dt.total_seconds()
Upvotes: 1
Reputation: 5007
Try to add this line:
series = pd.to_datetime(series)
before scaling, but it will turn your timestamps to floats. And i dont think it can be meaningfully reversed
Upvotes: 0