I am getting an issue with the timestamp column in my csv file. ValueError: could not convert string to float: '2020-02-21 22:00:00' for this line: import numpy as np import pandas as pd import matplotlib.pylab as plt from datetime import datetime from statsmodels.tools.eval_measures import rmse from sklearn.preprocessing import MinMaxScaler from keras.preprocessing.sequence import TimeseriesGenerator from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.layers import Dropout import warnings warnings.filterwarnings("ignore") "Import dataset" df = pd.read_csv('fx_intraday_1min_GBP_USD.csv') train, test = df[:-3], df[-3:] scaler = MinMaxScaler() scaler.fit(train) <----------- This line train = scaler.transform(train) test = scaler.transform(test) n_input = 3 n_features = 4 generator = TimeseriesGenerator(train, train, length=n_input, batch_size=6) model = Sequential() model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features))) model.add(Dropout(0.15)) model.add(Dense(1)) model.compile(optimizers='adam', loss='mse') model.fit_generator(generator, epochs=180) How can I convert the timestamp column (preferably when reading the csv) to a float? Link to the dataset : https://www.alphavantage.co/query?function=FX_INTRADAY&from_symbol=GBP&to_symbol=USD&interval=1min&apikey=OF7SE183CNQLT9DW&datatype=csv

How to utilise the date_parser parameter of pandas.read_csv()

Reputation: 5395

Performing Conversion On CSV Input Columns While Reading In The Data

Reading in CSV data applying conversion to the timestamp column to get float values:

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv', 
...                  converters={'timestamp': 
...                                 lambda t: pd.Timestamp(t).timestamp()})
>>> df
       timestamp    open    high     low   close
0   1.582322e+09  1.2953  1.2964  1.2953  1.2964
1   1.582322e+09  1.2955  1.2957  1.2952  1.2957
2   1.582322e+09  1.2956  1.2958  1.2954  1.2957
3   1.582322e+09  1.2957  1.2958  1.2954  1.2957
4   1.582322e+09  1.2957  1.2958  1.2955  1.2956
..           ...     ...     ...     ...     ...
95  1.582317e+09  1.2966  1.2967  1.2964  1.2965
96  1.582317e+09  1.2967  1.2968  1.2965  1.2966
97  1.582317e+09  1.2965  1.2967  1.2964  1.2966
98  1.582317e+09  1.2964  1.2967  1.2962  1.2966
99  1.582316e+09  1.2963  1.2965  1.2961  1.2964

[100 rows x 5 columns]

This can be applied to other columns too. The converters parameter takes a dictionary with the key being the column name and the value a function.

date_parser could be useful if the timestamp data spans more than one column or is in some strange format. The callback can receive the text from one or more columns for processing. The parse_dates parameter may need to be supplied with date_parser to indicate which columns to apply the callback to. date_parser is just a list of the column names or indices. An example of usage:

df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv', 
                 date_parser=lambda t: pd.Timestamp(t), 
                 parse_dates=['timestamp'])

pd.read_csv() with no date/time parameters produces a timestamp column of type object. Simply specifying which column is the timestamp using parse_dates and no other additional parameters fixes that:

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv', 
                     parse_dates=['timestamp'])
>>> df.dtypes
timestamp    datetime64[ns]
open                float64
high                float64
low                 float64
close               float64

Conversion of DataFrame Columns After Reading in CSV

As another user suggested, there's another way to convert the contents of a column using pd.to_datetime().

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>> df.dtypes
timestamp     object
open         float64
high         float64
low          float64
close        float64
dtype: object
>>> df['timestamp'] = pd.to_datetime(df['timestamp'])
>>> df.dtypes
timestamp    datetime64[ns]
open                float64
high                float64
low                 float64
close               float64
dtype: object
>>> 
>>> df['timestamp'] = df['timestamp'].apply(lambda t: t.timestamp())
>>> df
       timestamp    open    high     low   close
0   1.582322e+09  1.2953  1.2964  1.2953  1.2964
1   1.582322e+09  1.2955  1.2957  1.2952  1.2957
2   1.582322e+09  1.2956  1.2958  1.2954  1.2957
3   1.582322e+09  1.2957  1.2958  1.2954  1.2957
4   1.582322e+09  1.2957  1.2958  1.2955  1.2956
..           ...     ...     ...     ...     ...
95  1.582317e+09  1.2966  1.2967  1.2964  1.2965
96  1.582317e+09  1.2967  1.2968  1.2965  1.2966
97  1.582317e+09  1.2965  1.2967  1.2964  1.2966
98  1.582317e+09  1.2964  1.2967  1.2962  1.2966
99  1.582316e+09  1.2963  1.2965  1.2961  1.2964

[100 rows x 5 columns]

Or to do it all in one shot without pd.to_datetime():

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>>
>>> df['timestamp'] = df['timestamp'] \
...                      .apply(lambda t: pd.Timestamp(t).timestamp())
>>>

Upvotes: 3

Aiden Blishen Cuneo

Reputation: 369

It looks like you are trying to convert a string into a Datetime object.

To do this, you would need to use datetime.strptime with a formatting string as the second argument. This string would need to match the format of the date within your csv file.

So you would need this code to convert your time string into a Datetime object:

from datetime import datetime

df = pd.read_csv('intraday.csv')
dftime = datetime.strptime(df, '%Y-%m-%d %H:%M:%S')

After this, you can do whatever you want to do with dftime. If you want it to be a float (or timestamp), you can use this code:

timestamp = datetime.timestamp(dftime)

I hope this helps :)

Upvotes: 2

How to utilise the date_parser parameter of pandas.read_csv()

Answers (3)

Performing Conversion On CSV Input Columns While Reading In The Data

Conversion of DataFrame Columns After Reading in CSV

Related Questions