johnchase
johnchase

Reputation: 13725

How do I specify dtype with read_csv for data and index independently

I am trying to specify the dtype of the values when loading a pandas dataframe, but only for the values in the dataframe, not the index. Is this possible?

from io import StringIO
my_csv = StringIO('''b, c\nx, 1, 2\ny, 3, 2''')

I would have assumed the following would work:

pd.read_csv(my_csv, dtype='int64')

But if fails with:

ValueError: invalid literal for int() with base 10: 'x'

I can load the table without specifying dtype and then set it to just the values, but is there a way I can do this directly when reading the table?

Upvotes: 0

Views: 1002

Answers (1)

EdChum
EdChum

Reputation: 394329

One method would be to read just the header to get the column names and then zip them with your desired dtype and read the csv again:

In [6]:
t="""b,c
x,1,2
y,3,2"""
cols = pd.read_csv(io.StringIO(t), nrows=1).columns
dtyp = dict(zip(cols,['int64'] * len(cols)))
pd.read_csv(io.StringIO(t), dtype=dtyp).info()

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, x to y
Data columns (total 2 columns):
b    2 non-null int64
c    2 non-null int64
dtypes: int64(2)
memory usage: 48.0+ bytes

Upvotes: 2

Related Questions