JustForFun
JustForFun

Reputation: 72

Python - pandas - is it possible to read csv with multiple decimal marks

I'm trying to read a csv file with hundreds of float columns. Half of them have '.' as decimal mark, the others have ',' as decimal mark and none of them have any thousands separator so it would be helpful if one could set decimal parameter in pd.read_csv to ',' or '.' but it seems that Only length-1 decimal markers is supported for this parameter. Only one half of my columns are imported in dataframe with float dtype. The second half is Object dtype that must be treated separately to be converted to float.

>>> import pandas as pd
>>> df0 = pd.read_csv('example.csv')
>>> df0.head()
    col1   col2
0  123,2  12.02
1  22,15   1.50
>>> df0.dtypes
col1     object
col2    float64
dtype: object
>>> df1 = pd.read_csv('example.csv', decimal=',')
>>> df1.head()
     col1   col2
0  123.20  12.02
1   22.15    1.5
>>> df1.dtypes
col1    float64
col2     object
dtype: object

==> Is there any pythonesque way to import all columns as float and treat both '.' and ',' characters as decimal mark?

Upvotes: 0

Views: 1580

Answers (2)

Alperen
Alperen

Reputation: 4612

Before you read the file, use this:

with open("example.csv") as f:
    content = f.read()

content = content.replace('","','###')    #To prevent deleting required commas
content = content.replace(',','.')
content = content.replace('###','","')

with open("example.csv", "w") as f:
    content = f.write(content)

Upvotes: 1

zipa
zipa

Reputation: 27869

You can select objects and convert them to float:

obj = df0.select_dtypes(include=['object']).apply(lambda x: x.apply(lambda y: float(y.replace(',', '.'))))
df0[obj.columns] = obj 

Upvotes: 0

Related Questions