Brytedata
Brytedata

Reputation: 37

Extracting postcode from address

I have a housing data set with an address column. I want to visualize the correlation between postcode and price. However, the address is in the format (Norfolk Road, Edgbaston, Birmingham B15). How do I extract the postcode alone?

I have tried using

select substring_index(Address, ',', -1) 

but it gave me an error:

File "<ipython-input-29-2b936e3b990e>", line 1
    select substring_index(Address, ',', -1)
                         ^
SyntaxError: invalid syntax

How do I extract only the postcode, in this case B15?

What is the best way to visualize this data, since its a categorical variable not continuous. I think I should group the postcodes, but do I create a dummy?

Upvotes: 0

Views: 2077

Answers (1)

Clem G.
Clem G.

Reputation: 396

If your data are in a pandas DataFrame you can use the function extract() like this :

df["PostCode"] = df["Address"].str.extract(r"([A-Z]\d+)")

The function extract() uses the regexp to extract the first occurence of a capital letter followed by a one or several numbers representing the postcode you need.

Hope it helps

Upvotes: 1

Related Questions