Sander Bol
Sander Bol

Reputation: 37

Extracting zip code from a string with full address

I have scraped some websites to gather company data. The address data is one of them. Due to the HTML tag I was only able to scrape the data within one 'tag'. An example is of the output of my data can be seen below.

Streetname housenumber zip-code city country
Street 1 1234 AB Amsterdam Netherlands
Longerstreetname 22 9876 XY Den Haag Netherlands
Name: Address, Length: 314, dtype: object

Now, I need to extract the ZIP code (only the zip code) into a new column for further analysis. I am mostly using pandas within my data cleaning phase. (I need to find out in what province every company is located)

I have searched for numerous options to find a method to extract the zip code, hence I did not succeed. Any help would be very much appreciated!

enter image description here

Upvotes: 0

Views: 4704

Answers (2)

Arzu Huseynov
Arzu Huseynov

Reputation: 210

I think you can use regex.

Example:

import re


address = '7802 Grant Avenue Egg Harbor Township, NJ 08234'
us_zip = r'(\d{5}\-?\d{0,4})'
zip_code = re.search(us_zip, address)
zip_code.group(1)

Important note: There is no specific pattern for zip code around the world. If you want to scrape companies from different countries, you should implement regex for all of them.

Hope this file could help you. zip codes regex

Upvotes: 3

Ishwar Venugopal
Ishwar Venugopal

Reputation: 882

If the sample output posted in the question are the values in a column named Address of type object in a dataframe, then a new column with extracted zip codes can be created as follows:

df['Zip Code'] = " ".join(str(df['Address']).split(" ")[2:4])

Upvotes: 0

Related Questions