Jake
Jake

Reputation: 2912

Pandas: read excel nrows not working, and dtype does not preserve 0 padding

I have difficulties in defining the parameters when reading in the excel, when specifying the dtype and nrows.

Let's take an example this small table saved in excel .xlsx format. 'col1' numbers are padded with 0.

col1    col2
01  a
02  b
03  c
04  d

First question, I want to read the entire table but preserve the padding. I tried using dtype to define as object or str, and using converters too (below). The dtype is converted to object, however, padding is not preserved. Is there anyway to do this?

pd.read_excel(path, sheetname=0, dtype={'col1': object}, nrows=5)
pd.read_excel(path, sheetname=0, converters={'col1':lambda x: str(x)}, nrows=5)

Second question, I tried to pull a subset of the dataframe, using nrows (below). However, this does not work at all and still pull out the entire table.

pd.read_excel(path, sheetname=0, nrows=2)

For both instances, it work perfectly fine in pd.read_csv

I am using pandas v0.20.3.

Upvotes: 0

Views: 2466

Answers (2)

Yuca
Yuca

Reputation: 6091

If you format something in excel that doesn't mean that the value stored in the excel file is actually '01'. Save it as a csv and open it in notepad. My guess is that you shouldn't see a '01' but a '1'

nrows is for pandas v 23 and you're on v 20

Upvotes: 2

Burhan Khalid
Burhan Khalid

Reputation: 174624

The reason the formatting doesn't work, is because Excel's formatting only changes the way data is displayed, not how its stored.

To change the way data is stored; you need to change the native format of the file; or format the data the way you want.

In your case, you are converting it to a string, what you should do is convert it to a zero padded string; for which there is a special function called str.zfill().

The second part of your question is much simpler - nrows argument for read_excel was added in pandas version 0.23.0

Upvotes: 1

Related Questions