Reputation: 72
I'm trying to read a csv file with hundreds of float columns. Half of them have '.' as decimal mark, the others have ',' as decimal mark and none of them have any thousands separator so it would be helpful if one could set decimal parameter in pd.read_csv to ',' or '.' but it seems that Only length-1 decimal markers is supported for this parameter. Only one half of my columns are imported in dataframe with float dtype. The second half is Object dtype that must be treated separately to be converted to float.
>>> import pandas as pd
>>> df0 = pd.read_csv('example.csv')
>>> df0.head()
col1 col2
0 123,2 12.02
1 22,15 1.50
>>> df0.dtypes
col1 object
col2 float64
dtype: object
>>> df1 = pd.read_csv('example.csv', decimal=',')
>>> df1.head()
col1 col2
0 123.20 12.02
1 22.15 1.5
>>> df1.dtypes
col1 float64
col2 object
dtype: object
==> Is there any pythonesque way to import all columns as float and treat both '.' and ',' characters as decimal mark?
Upvotes: 0
Views: 1580
Reputation: 4612
Before you read the file, use this:
with open("example.csv") as f:
content = f.read()
content = content.replace('","','###') #To prevent deleting required commas
content = content.replace(',','.')
content = content.replace('###','","')
with open("example.csv", "w") as f:
content = f.write(content)
Upvotes: 1
Reputation: 27869
You can select objects
and convert them to float
:
obj = df0.select_dtypes(include=['object']).apply(lambda x: x.apply(lambda y: float(y.replace(',', '.'))))
df0[obj.columns] = obj
Upvotes: 0