Rach Sharp
Rach Sharp

Reputation: 2454

pandas.read_csv converts decimal zero-padded float columns to int

I'm storing a Pandas DataFrame in a .csv file which has a column with integer data, but the database system treats as a float for legacy reasons, so the .csv needs to store it as a float too. When storing it with df.to_csv, it preserves the zero-padded decimal part, so the column in the .csv file is like:

IntNumber
3.0
45.0
123.0
...

But when I load this with pandas.from_csv, it infers the type as int64 despite the trailing zero. I've looked through the pandas.read_csv docs and it looks like I can specify the datatype manually to be float64, but I think there are multiple instances of this needing to be detected as float instead of int64. It would be useful to have it automatically infer the type to be float when the trailing zero is present, is this possible?

Snippet of how I'm loading the csv, at this point it's inferred the type of the column in dataframe to be int64

dataframe = pandas.read_csv("<csv_name>", index_col=0, parse_dates=True)

Upvotes: 0

Views: 1837

Answers (2)

chjortlund
chjortlund

Reputation: 4027

I'm unable to reproduce your problem in my Pandas version (0.23.1), but it's possible to be explicit about the types when reading the CSV file by using the dtype parameter.

Like this:

import pandas as pd
import numpy as np
from io import StringIO


def read_data():
    return StringIO("""IntNumber\n3.0\n45.0\n\n123.0""")

df = pd.read_csv(read_data(), dtype={'IntNumber': np.float32})
print(df.dtypes)

# Output:
# IntNumber    float32
# dtype: object

df = pd.read_csv(read_data(), dtype={'IntNumber': np.int32})
print(df.dtypes)

# Output:
# IntNumber    int32
# dtype: object

Upvotes: 1

jpp
jpp

Reputation: 164843

I cannot replicate your issue, see example below.

from io import StringIO
import pandas as pd

mystr = StringIO("""IntNumber
3.0
45.0
123.0""")

df = pd.read_csv(mystr)

#pandas 0.19.2, python 3.6.0
print(df.dtypes)

IntNumber    float64
dtype: object

#pandas 0.23.1, python 3.6.4
print(df.dtypes)

IntNumber    float64
dtype: object

Upvotes: 1

Related Questions