kwashington122
kwashington122

Reputation: 1045

Value error: time series data can not convert string to float

The below code is not working I am trying to use my timestamp data, aggregate it in 10 minute intervals and do time series analysis in pandas, I am getting a value error on the last command and I dont know how to remediate

0       2016-01-01 00:11:52
1       2016-01-01 00:13:00
2       2016-01-01 00:14:49
3       2016-01-01 00:21:00
4       2016-01-01 00:23:05
5       2016-01-01 00:29:00
6       2016-01-01 00:30:00
7       2016-01-01 00:30:36
8       2016-01-01 00:32:00
9       2016-01-01 00:33:00
10      2016-01-01 00:36:40
11      2016-01-01 00:36:55
12      2016-01-01 00:41:00
13      2016-01-01 00:48:17
14      2016-01-01 00:50:49
15      2016-01-01 00:51:00
16      2016-01-01 00:53:00
17      2016-01-01 00:56:00
18      2016-01-01 00:57:00
19      2016-01-01 00:57:35
20      2016-01-01 01:01:00
21      2016-01-01 01:01:37
22      2016-01-01 01:02:07
23      2016-01-01 01:08:32
24      2016-01-01 01:09:00
25      2016-01-01 01:16:00
26      2016-01-01 01:18:47
27      2016-01-01 01:21:00
28      2016-01-01 01:27:34
29      2016-01-01 01:29:07
               ...        
values = series.values
values = values.reshape((len(values), 1))
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(values)

Upvotes: 0

Views: 2578

Answers (2)

piRSquared
piRSquared

Reputation: 294348

Assuming dtype is Timestamp... if not, do this first

series = pd.to_datetime(series)

Problem is you need to turn these into numbers that make sense for scaling.
I'd subtract the minimum date from the series to get a series of Timedeltas. Then find the total seconds for each Timedelta which is a float.
Now you're ready for MinMaxScaler

values = series.sub(series.min()).dt.total_seconds().values
values = values.reshape((len(values), 1))
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(values)

setup for debugging

from io import StringIO
import pandas as pd

txt = """0       2016-01-01 00:11:52
1       2016-01-01 00:13:00
2       2016-01-01 00:14:49
3       2016-01-01 00:21:00
4       2016-01-01 00:23:05
5       2016-01-01 00:29:00
6       2016-01-01 00:30:00
7       2016-01-01 00:30:36
8       2016-01-01 00:32:00
9       2016-01-01 00:33:00
10      2016-01-01 00:36:40
11      2016-01-01 00:36:55
12      2016-01-01 00:41:00
13      2016-01-01 00:48:17
14      2016-01-01 00:50:49
15      2016-01-01 00:51:00
16      2016-01-01 00:53:00
17      2016-01-01 00:56:00
18      2016-01-01 00:57:00
19      2016-01-01 00:57:35
20      2016-01-01 01:01:00
21      2016-01-01 01:01:37
22      2016-01-01 01:02:07
23      2016-01-01 01:08:32
24      2016-01-01 01:09:00
25      2016-01-01 01:16:00
26      2016-01-01 01:18:47
27      2016-01-01 01:21:00
28      2016-01-01 01:27:34
29      2016-01-01 01:29:07"""

series = pd.read_csv(StringIO(txt),
                     sep='\s{2,}', header=None,
                     index_col=0, squeeze=True,
                     engine='python').rename_axis(None)

series = pd.to_datetime(series)

series.sub(series.min()).dt.total_seconds()

Upvotes: 1

Alexey Trofimov
Alexey Trofimov

Reputation: 5007

Try to add this line:

series = pd.to_datetime(series)

before scaling, but it will turn your timestamps to floats. And i dont think it can be meaningfully reversed

Upvotes: 0

Related Questions