Reputation: 1264
I'm using Pandas to load an Excel spreadsheet which contains zip code (e.g. 32771). The zip codes are stored as 5 digit strings in spreadsheet. When they are pulled into a DataFrame using the command...
xls = pd.ExcelFile("5-Digit-Zip-Codes.xlsx")
dfz = xls.parse('Zip Codes')
they are converted into numbers. So '00501' becomes 501.
So my questions are, how do I:
a. Load the DataFrame and keep the string type of the zip codes stored in the Excel file?
b. Convert the numbers in the DataFrame into a five digit string e.g. "501" becomes "00501"?
Upvotes: 7
Views: 11667
Reputation: 1704
Pandas.read_excel docs say that you can preserve the data exactly as in the Excel sheet by specifying dtype as object
:
https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html
dtypeType name or dict of column -> type, default None Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use object to preserve data as stored in Excel and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.
So, something like this should work:
xls = pd.read_excel("5-Digit-Zip-Codes.xlsx", dtype=dtype={'zip_code': object, 'other_col': str})
(note: not at my work pc right now, so wasn't able to test it yet)
Upvotes: 0
Reputation: 6075
The previous answers have correctly suggested using zfill(5)
. However, if your zipcodes are already in float
datatype for some reason (I recently encountered data like this), you first need to convert it to int
. Then you can use zfill(5)
.
df = pd.DataFrame({'zipcode':[11.0, 11013.0]})
zipcode
0 11.0
1 11013.0
df['zipcode'] = df['zipcode'].astype(int).astype(str).str.zfill(5)
zipcode
0 00011
1 11013
Upvotes: 0
Reputation: 880269
As a workaround, you could convert the int
s to 0-padded strings of length 5 using Series.str.zfill
:
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
Demo:
import pandas as pd
df = pd.DataFrame({'zipcode':['00501']})
df.to_excel('/tmp/out.xlsx')
xl = pd.ExcelFile('/tmp/out.xlsx')
df = xl.parse('Sheet1')
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
print(df)
yields
zipcode
0 00501
Upvotes: 8
Reputation: 52276
You can avoid panda's type inference with a custom converter, e.g. if 'zipcode'
was the header of the column with zipcodes:
dfz = xls.parse('Zip Codes', converters={'zipcode': lambda x:x})
This is arguably a bug since the column was originally string encoded, made an issue here
Upvotes: 2
Reputation: 114038
str(my_zip).zfill(5)
or
print("{0:>05s}".format(str(my_zip)))
are 2 of many many ways to do this
Upvotes: 1