Reputation: 2912
I have difficulties in defining the parameters when reading in the excel, when specifying the dtype and nrows.
Let's take an example this small table saved in excel .xlsx format. 'col1' numbers are padded with 0.
col1 col2
01 a
02 b
03 c
04 d
First question, I want to read the entire table but preserve the padding. I tried using dtype to define as object or str, and using converters too (below). The dtype is converted to object, however, padding is not preserved. Is there anyway to do this?
pd.read_excel(path, sheetname=0, dtype={'col1': object}, nrows=5)
pd.read_excel(path, sheetname=0, converters={'col1':lambda x: str(x)}, nrows=5)
Second question, I tried to pull a subset of the dataframe, using nrows (below). However, this does not work at all and still pull out the entire table.
pd.read_excel(path, sheetname=0, nrows=2)
For both instances, it work perfectly fine in pd.read_csv
I am using pandas v0.20.3.
Upvotes: 0
Views: 2466
Reputation: 6091
If you format something in excel that doesn't mean that the value stored in the excel file is actually '01'. Save it as a csv and open it in notepad. My guess is that you shouldn't see a '01' but a '1'
nrows
is for pandas v 23 and you're on v 20
Upvotes: 2
Reputation: 174624
The reason the formatting doesn't work, is because Excel's formatting only changes the way data is displayed, not how its stored.
To change the way data is stored; you need to change the native format of the file; or format the data the way you want.
In your case, you are converting it to a string, what you should do is convert it to a zero padded string; for which there is a special function called str.zfill()
.
The second part of your question is much simpler - nrows
argument for read_excel
was added in pandas version 0.23.0
Upvotes: 1