Reputation: 13725
I am trying to specify the dtype
of the values when loading a pandas dataframe, but only for the values in the dataframe, not the index. Is this possible?
from io import StringIO
my_csv = StringIO('''b, c\nx, 1, 2\ny, 3, 2''')
I would have assumed the following would work:
pd.read_csv(my_csv, dtype='int64')
But if fails with:
ValueError: invalid literal for int() with base 10: 'x'
I can load the table without specifying dtype
and then set it to just the values, but is there a way I can do this directly when reading the table?
Upvotes: 0
Views: 1002
Reputation: 394329
One method would be to read just the header to get the column names and then zip
them with your desired dtype and read the csv again:
In [6]:
t="""b,c
x,1,2
y,3,2"""
cols = pd.read_csv(io.StringIO(t), nrows=1).columns
dtyp = dict(zip(cols,['int64'] * len(cols)))
pd.read_csv(io.StringIO(t), dtype=dtyp).info()
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, x to y
Data columns (total 2 columns):
b 2 non-null int64
c 2 non-null int64
dtypes: int64(2)
memory usage: 48.0+ bytes
Upvotes: 2