Pandas: read excel nrows not working, and dtype does not preserve 0 padding

Question

I have difficulties in defining the parameters when reading in the excel, when specifying the dtype and nrows.

Let's take an example this small table saved in excel .xlsx format. 'col1' numbers are padded with 0.

col1    col2
01  a
02  b
03  c
04  d

First question, I want to read the entire table but preserve the padding. I tried using dtype to define as object or str, and using converters too (below). The dtype is converted to object, however, padding is not preserved. Is there anyway to do this?

pd.read_excel(path, sheetname=0, dtype={'col1': object}, nrows=5)
pd.read_excel(path, sheetname=0, converters={'col1':lambda x: str(x)}, nrows=5)

Second question, I tried to pull a subset of the dataframe, using nrows (below). However, this does not work at all and still pull out the entire table.

pd.read_excel(path, sheetname=0, nrows=2)

For both instances, it work perfectly fine in pd.read_csv

I am using pandas v0.20.3.

Yuca · Accepted Answer

If you format something in excel that doesn't mean that the value stored in the excel file is actually '01'. Save it as a csv and open it in notepad. My guess is that you shouldn't see a '01' but a '1'

nrows is for pandas v 23 and you're on v 20

Pandas: read excel nrows not working, and dtype does not preserve 0 padding

Answers (2)

Related Questions