G Gr
G Gr

Reputation: 6080

How to utilise the date_parser parameter of pandas.read_csv()

I am getting an issue with the timestamp column in my csv file.

ValueError: could not convert string to float: '2020-02-21 22:00:00'

for this line:

    import numpy as np
import pandas as pd
import matplotlib.pylab as plt 
from datetime import datetime
from statsmodels.tools.eval_measures import rmse
from sklearn.preprocessing import MinMaxScaler
from keras.preprocessing.sequence import TimeseriesGenerator
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import warnings
warnings.filterwarnings("ignore")

"Import dataset"
df = pd.read_csv('fx_intraday_1min_GBP_USD.csv')


train, test = df[:-3], df[-3:]
scaler = MinMaxScaler()
scaler.fit(train) <----------- This line
train = scaler.transform(train)
test = scaler.transform(test)

n_input = 3
n_features = 4

generator = TimeseriesGenerator(train, train, length=n_input, batch_size=6)

model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features)))
model.add(Dropout(0.15))
model.add(Dense(1))
model.compile(optimizers='adam', loss='mse')
model.fit_generator(generator, epochs=180)

How can I convert the timestamp column (preferably when reading the csv) to a float?

enter image description here

Link to the dataset: https://www.alphavantage.co/query?function=FX_INTRADAY&from_symbol=GBP&to_symbol=USD&interval=1min&apikey=OF7SE183CNQLT9DW&datatype=csv

Upvotes: 4

Views: 8932

Answers (3)

Todd
Todd

Reputation: 5395

Performing Conversion On CSV Input Columns While Reading In The Data

Reading in CSV data applying conversion to the timestamp column to get float values:

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv', 
...                  converters={'timestamp': 
...                                 lambda t: pd.Timestamp(t).timestamp()})
>>> df
       timestamp    open    high     low   close
0   1.582322e+09  1.2953  1.2964  1.2953  1.2964
1   1.582322e+09  1.2955  1.2957  1.2952  1.2957
2   1.582322e+09  1.2956  1.2958  1.2954  1.2957
3   1.582322e+09  1.2957  1.2958  1.2954  1.2957
4   1.582322e+09  1.2957  1.2958  1.2955  1.2956
..           ...     ...     ...     ...     ...
95  1.582317e+09  1.2966  1.2967  1.2964  1.2965
96  1.582317e+09  1.2967  1.2968  1.2965  1.2966
97  1.582317e+09  1.2965  1.2967  1.2964  1.2966
98  1.582317e+09  1.2964  1.2967  1.2962  1.2966
99  1.582316e+09  1.2963  1.2965  1.2961  1.2964

[100 rows x 5 columns]

This can be applied to other columns too. The converters parameter takes a dictionary with the key being the column name and the value a function.

date_parser could be useful if the timestamp data spans more than one column or is in some strange format. The callback can receive the text from one or more columns for processing. The parse_dates parameter may need to be supplied with date_parser to indicate which columns to apply the callback to. date_parser is just a list of the column names or indices. An example of usage:

df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv', 
                 date_parser=lambda t: pd.Timestamp(t), 
                 parse_dates=['timestamp'])

pd.read_csv() with no date/time parameters produces a timestamp column of type object. Simply specifying which column is the timestamp using parse_dates and no other additional parameters fixes that:

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv', 
                     parse_dates=['timestamp'])
>>> df.dtypes
timestamp    datetime64[ns]
open                float64
high                float64
low                 float64
close               float64

Conversion of DataFrame Columns After Reading in CSV

As another user suggested, there's another way to convert the contents of a column using pd.to_datetime().

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>> df.dtypes
timestamp     object
open         float64
high         float64
low          float64
close        float64
dtype: object
>>> df['timestamp'] = pd.to_datetime(df['timestamp'])
>>> df.dtypes
timestamp    datetime64[ns]
open                float64
high                float64
low                 float64
close               float64
dtype: object
>>> 
>>> df['timestamp'] = df['timestamp'].apply(lambda t: t.timestamp())
>>> df
       timestamp    open    high     low   close
0   1.582322e+09  1.2953  1.2964  1.2953  1.2964
1   1.582322e+09  1.2955  1.2957  1.2952  1.2957
2   1.582322e+09  1.2956  1.2958  1.2954  1.2957
3   1.582322e+09  1.2957  1.2958  1.2954  1.2957
4   1.582322e+09  1.2957  1.2958  1.2955  1.2956
..           ...     ...     ...     ...     ...
95  1.582317e+09  1.2966  1.2967  1.2964  1.2965
96  1.582317e+09  1.2967  1.2968  1.2965  1.2966
97  1.582317e+09  1.2965  1.2967  1.2964  1.2966
98  1.582317e+09  1.2964  1.2967  1.2962  1.2966
99  1.582316e+09  1.2963  1.2965  1.2961  1.2964

[100 rows x 5 columns]

Or to do it all in one shot without pd.to_datetime():

>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>>
>>> df['timestamp'] = df['timestamp'] \
...                      .apply(lambda t: pd.Timestamp(t).timestamp())
>>>

Upvotes: 3

Pahalwan
Pahalwan

Reputation: 11

you can use pandas method to convert whole column to datetime:

import pandas as pd

pd.to_datetime(column)

and then if you want this column as a index use:

pd.set_index(column)

Upvotes: -1

Aiden Blishen Cuneo
Aiden Blishen Cuneo

Reputation: 369

It looks like you are trying to convert a string into a Datetime object.

To do this, you would need to use datetime.strptime with a formatting string as the second argument. This string would need to match the format of the date within your csv file.

So you would need this code to convert your time string into a Datetime object:

from datetime import datetime

df = pd.read_csv('intraday.csv')
dftime = datetime.strptime(df, '%Y-%m-%d %H:%M:%S')

After this, you can do whatever you want to do with dftime. If you want it to be a float (or timestamp), you can use this code:

timestamp = datetime.timestamp(dftime)

I hope this helps :)

Upvotes: 2

Related Questions