Reputation: 323
def getAndBuildDatafrmeFromCsvBasic(filename):
colTypes = {'Open': 'float64', 'High': 'float64', 'Low': 'float64', 'Close': 'float64', 'Volume': 'float64'}
dfEurUsd2017 = pd.read_csv(filename, delimiter=",", index_col='Gmt time', dtype=colTypes, parse_dates=['Gmt time'])
return dfEurUsd2017
def getAndBuildDatafrmeFromCsv(filename):
df = pd.read_csv(filename, header=None)
df.columns = ['date', 'Open', 'High', 'Low', 'Close', 'Volume']
df.date = pd.to_datetime(df.date, format='%d.%m.%Y %H:%M:%S.%f')
df.index = df['date']
df = df[['Open', 'High', 'Low', 'Close', 'Volume']]
return df
Open High Low Close Volume
Gmt time
2017-12-04 23:00:00 1.06672 1.06699 1.06636 1.06698 1889.56
Open High Low Close Volume
Gmt time
2017-12-04 23:00:00 1.18686 1.18699 1.18666 1.18682 2004.46
Why does method1 incorrectly parse the values for Open,High,Low,Close,Volume? Method2 produces the correct output. I'm concerned why both methods output completely different numerical values, even the volume is different. Yet the csv file is the same.
Rows from CSV
04.12.2017 23:00:00.000,1.18686,1.18699,1.18666,1.18682,2004.4599999999998
04.12.2017 23:30:00.000,1.18682,1.18706,1.18652,1.18681,1242.68
05.12.2017 00:00:00.000,1.18681,1.18691,1.18639,1.18653,2666.81
05.12.2017 00:30:00.000,1.18653,1.18726,1.18650,1.18709,3567.2400000000007
05.12.2017 01:00:00.000,1.18708,1.18750,1.18707,1.18738,3105.4699999999993
05.12.2017 01:30:00.000,1.18738,1.18744,1.18691,1.18732,3561.5
05.12.2017 02:00:00.000,1.18732,1.18766,1.18704,1.18740,2706.6400000000003
Upvotes: 1
Views: 950
Reputation: 18916
I added dayfirst=True and your code works fine.
What pandas version are u using? And where would that falsy data come from?
import pandas as pd
data = '''\
Gmt time,Open,High,Low,Close,Volume
04.12.2017 23:00:00.000,1.18686,1.18699,1.18666,1.18682,2004.4599999999998
04.12.2017 23:30:00.000,1.18682,1.18706,1.18652,1.18681,1242.68
05.12.2017 00:00:00.000,1.18681,1.18691,1.18639,1.18653,2666.81
05.12.2017 00:30:00.000,1.18653,1.18726,1.18650,1.18709,3567.2400000000007
05.12.2017 01:00:00.000,1.18708,1.18750,1.18707,1.18738,3105.4699999999993
05.12.2017 01:30:00.000,1.18738,1.18744,1.18691,1.18732,3561.5
05.12.2017 02:00:00.000,1.18732,1.18766,1.18704,1.18740,2706.6400000000003
'''
with open('test.csv', 'w') as f:
f.write(data)
def getAndBuildDatafrmeFromCsvBasic(filename):
colTypes = {'Open': 'float64', 'High': 'float64', 'Low': 'float64', 'Close': 'float64', 'Volume': 'float64'}
dfEurUsd2017 = pd.read_csv(filename, delimiter=",", index_col='Gmt time', dtype=colTypes, parse_dates=['Gmt time'], dayfirst=True)
return dfEurUsd2017
print(getAndBuildDatafrmeFromCsvBasic('test.csv'))
Returns:
Open High Low Close Volume
Gmt time
2017-12-04 23:00:00 1.18686 1.18699 1.18666 1.18682 2004.46
2017-12-04 23:30:00 1.18682 1.18706 1.18652 1.18681 1242.68
2017-12-05 00:00:00 1.18681 1.18691 1.18639 1.18653 2666.81
2017-12-05 00:30:00 1.18653 1.18726 1.18650 1.18709 3567.24
2017-12-05 01:00:00 1.18708 1.18750 1.18707 1.18738 3105.47
2017-12-05 01:30:00 1.18738 1.18744 1.18691 1.18732 3561.50
2017-12-05 02:00:00 1.18732 1.18766 1.18704 1.18740 2706.64
Upvotes: 1