Reputation: 6080
I am getting an issue with the timestamp
column in my csv file.
ValueError: could not convert string to float: '2020-02-21 22:00:00'
for this line:
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
from datetime import datetime
from statsmodels.tools.eval_measures import rmse
from sklearn.preprocessing import MinMaxScaler
from keras.preprocessing.sequence import TimeseriesGenerator
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import warnings
warnings.filterwarnings("ignore")
"Import dataset"
df = pd.read_csv('fx_intraday_1min_GBP_USD.csv')
train, test = df[:-3], df[-3:]
scaler = MinMaxScaler()
scaler.fit(train) <----------- This line
train = scaler.transform(train)
test = scaler.transform(test)
n_input = 3
n_features = 4
generator = TimeseriesGenerator(train, train, length=n_input, batch_size=6)
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features)))
model.add(Dropout(0.15))
model.add(Dense(1))
model.compile(optimizers='adam', loss='mse')
model.fit_generator(generator, epochs=180)
How can I convert the timestamp
column (preferably when reading the csv) to a float?
Link to the dataset: https://www.alphavantage.co/query?function=FX_INTRADAY&from_symbol=GBP&to_symbol=USD&interval=1min&apikey=OF7SE183CNQLT9DW&datatype=csv
Upvotes: 4
Views: 8932
Reputation: 5395
Reading in CSV data applying conversion to the timestamp column to get float values:
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv',
... converters={'timestamp':
... lambda t: pd.Timestamp(t).timestamp()})
>>> df
timestamp open high low close
0 1.582322e+09 1.2953 1.2964 1.2953 1.2964
1 1.582322e+09 1.2955 1.2957 1.2952 1.2957
2 1.582322e+09 1.2956 1.2958 1.2954 1.2957
3 1.582322e+09 1.2957 1.2958 1.2954 1.2957
4 1.582322e+09 1.2957 1.2958 1.2955 1.2956
.. ... ... ... ... ...
95 1.582317e+09 1.2966 1.2967 1.2964 1.2965
96 1.582317e+09 1.2967 1.2968 1.2965 1.2966
97 1.582317e+09 1.2965 1.2967 1.2964 1.2966
98 1.582317e+09 1.2964 1.2967 1.2962 1.2966
99 1.582316e+09 1.2963 1.2965 1.2961 1.2964
[100 rows x 5 columns]
This can be applied to other columns too. The converters
parameter takes a dictionary with the key being the column name and the value a function.
date_parser
could be useful if the timestamp data spans more than one column or is in some strange format. The callback can receive the text from one or more columns for processing. The parse_dates
parameter may need to be supplied with date_parser
to indicate which columns to apply the callback to. date_parser
is just a list of the column names or indices. An example of usage:
df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv',
date_parser=lambda t: pd.Timestamp(t),
parse_dates=['timestamp'])
pd.read_csv()
with no date/time parameters produces a timestamp column of type object
. Simply specifying which column is the timestamp using parse_dates
and no other additional parameters fixes that:
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv',
parse_dates=['timestamp'])
>>> df.dtypes
timestamp datetime64[ns]
open float64
high float64
low float64
close float64
As another user suggested, there's another way to convert the contents of a column using pd.to_datetime()
.
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>> df.dtypes
timestamp object
open float64
high float64
low float64
close float64
dtype: object
>>> df['timestamp'] = pd.to_datetime(df['timestamp'])
>>> df.dtypes
timestamp datetime64[ns]
open float64
high float64
low float64
close float64
dtype: object
>>>
>>> df['timestamp'] = df['timestamp'].apply(lambda t: t.timestamp())
>>> df
timestamp open high low close
0 1.582322e+09 1.2953 1.2964 1.2953 1.2964
1 1.582322e+09 1.2955 1.2957 1.2952 1.2957
2 1.582322e+09 1.2956 1.2958 1.2954 1.2957
3 1.582322e+09 1.2957 1.2958 1.2954 1.2957
4 1.582322e+09 1.2957 1.2958 1.2955 1.2956
.. ... ... ... ... ...
95 1.582317e+09 1.2966 1.2967 1.2964 1.2965
96 1.582317e+09 1.2967 1.2968 1.2965 1.2966
97 1.582317e+09 1.2965 1.2967 1.2964 1.2966
98 1.582317e+09 1.2964 1.2967 1.2962 1.2966
99 1.582316e+09 1.2963 1.2965 1.2961 1.2964
[100 rows x 5 columns]
Or to do it all in one shot without pd.to_datetime()
:
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>>
>>> df['timestamp'] = df['timestamp'] \
... .apply(lambda t: pd.Timestamp(t).timestamp())
>>>
Upvotes: 3
Reputation: 11
you can use pandas method to convert whole column to datetime:
import pandas as pd
pd.to_datetime(column)
and then if you want this column as a index use:
pd.set_index(column)
Upvotes: -1
Reputation: 369
It looks like you are trying to convert a string into a Datetime object.
To do this, you would need to use datetime.strptime
with a formatting string as the second argument. This string would need to match the format of the date within your csv file.
So you would need this code to convert your time string into a Datetime object:
from datetime import datetime
df = pd.read_csv('intraday.csv')
dftime = datetime.strptime(df, '%Y-%m-%d %H:%M:%S')
After this, you can do whatever you want to do with dftime
. If you want it to be a float (or timestamp), you can use this code:
timestamp = datetime.timestamp(dftime)
I hope this helps :)
Upvotes: 2