Reputation: 1789
I have a script where I read from a csv file some zipcodes. The format of the zipcodes are like this:
zipcode
75180
90672
01037
20253
09117
31029
07745
90453
12105
18140
36108
10403
76470
06628
93105
88069
31094
84095
63069
Then I run a script:
import requests
import pandas as pd
import time
file = '/Users/zipcode.csv'
reader = pd.read_csv(file, sep=';', encoding='utf-8-sig')
zipcodes = reader["zipcode"].astype(str)
base_url = "https://api.blabla/?zipcode={zipcode}"
headers = {'Authentication': 'random'}
for zipcode in zipcodes:
url = base_url.format(zipcode=zipcode)
r = requests.get(url,
headers=headers)
for r_info in r.json()["data"]:
print zipcode,r_info["id"]
time.sleep(0.5)
However, whenever there is a zipcode starting with 0, the result I get is with 4 digits and it can't match with the actual 0. I have formatted my csv to have a text column in it, but still it doesn't work.
The zipcodes I get are like this:
zipcode
75180
90672
1037
20253
9117
31029
7745
90453
12105
18140
36108
10403
76470
6628
93105
88069
31094
84095
63069
Do you have any idea how can this be resolved?
Upvotes: 2
Views: 438
Reputation: 394269
You need to pass the dtype
as str
:
reader = pd.read_csv(file, sep=';', encoding='utf-8-sig', dtype=str)
to read those values as str:
In [152]:
import pandas as pd
import io
t="""zipcode
75180
90672
01037
20253
09117
31029
07745
90453
12105
18140
36108
10403
76470
06628
93105
88069
31094
84095
63069"""
df = pd.read_csv(io.StringIO(t), dtype=str)
df
Out[152]:
zipcode
0 75180
1 90672
2 01037
3 20253
4 09117
5 31029
6 07745
7 90453
8 12105
9 18140
10 36108
11 10403
12 76470
13 06628
14 93105
15 88069
16 31094
17 84095
18 63069
by default pandas sniffs the dytpes
and in this case it thinks they are numeric so you lose leading zeroes
You can also do this as a post-processing step by casting to str
and then using the vectorised str.zfill
:
In [154]:
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
df
Out[154]:
zipcode
0 75180
1 90672
2 01037
3 20253
4 09117
5 31029
6 07745
7 90453
8 12105
9 18140
10 36108
11 10403
12 76470
13 06628
14 93105
15 88069
16 31094
17 84095
18 63069
Upvotes: 5