Steve Maughan
Steve Maughan

Reputation: 1264

Python - Loading Zip Codes into a DataFrame as Strings?

I'm using Pandas to load an Excel spreadsheet which contains zip code (e.g. 32771). The zip codes are stored as 5 digit strings in spreadsheet. When they are pulled into a DataFrame using the command...

xls = pd.ExcelFile("5-Digit-Zip-Codes.xlsx")
dfz = xls.parse('Zip Codes')

they are converted into numbers. So '00501' becomes 501.

So my questions are, how do I:

a. Load the DataFrame and keep the string type of the zip codes stored in the Excel file?

b. Convert the numbers in the DataFrame into a five digit string e.g. "501" becomes "00501"?

Upvotes: 7

Views: 11667

Answers (5)

Josef Joe Samanek
Josef Joe Samanek

Reputation: 1704

Pandas.read_excel docs say that you can preserve the data exactly as in the Excel sheet by specifying dtype as object: https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

dtypeType name or dict of column -> type, default None Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use object to preserve data as stored in Excel and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.

So, something like this should work:

xls = pd.read_excel("5-Digit-Zip-Codes.xlsx", dtype=dtype={'zip_code': object, 'other_col': str})

(note: not at my work pc right now, so wasn't able to test it yet)

Upvotes: 0

Sunit Gautam
Sunit Gautam

Reputation: 6075

The previous answers have correctly suggested using zfill(5). However, if your zipcodes are already in float datatype for some reason (I recently encountered data like this), you first need to convert it to int. Then you can use zfill(5).

df = pd.DataFrame({'zipcode':[11.0, 11013.0]})
    zipcode
0   11.0
1   11013.0
df['zipcode'] = df['zipcode'].astype(int).astype(str).str.zfill(5)
    zipcode
0   00011
1   11013

Upvotes: 0

unutbu
unutbu

Reputation: 880269

As a workaround, you could convert the ints to 0-padded strings of length 5 using Series.str.zfill:

df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)

Demo:

import pandas as pd
df = pd.DataFrame({'zipcode':['00501']})
df.to_excel('/tmp/out.xlsx')
xl = pd.ExcelFile('/tmp/out.xlsx')
df = xl.parse('Sheet1')
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
print(df)

yields

  zipcode
0   00501

Upvotes: 8

chrisb
chrisb

Reputation: 52276

You can avoid panda's type inference with a custom converter, e.g. if 'zipcode' was the header of the column with zipcodes:

dfz = xls.parse('Zip Codes', converters={'zipcode': lambda x:x})

This is arguably a bug since the column was originally string encoded, made an issue here

Upvotes: 2

Joran Beasley
Joran Beasley

Reputation: 114038

str(my_zip).zfill(5)

or

print("{0:>05s}".format(str(my_zip)))

are 2 of many many ways to do this

Upvotes: 1

Related Questions