Reputation: 37
I have a housing data set with an address column. I want to visualize the correlation between postcode and price. However, the address is in the format (Norfolk Road, Edgbaston, Birmingham B15)
. How do I extract the postcode alone?
I have tried using
select substring_index(Address, ',', -1)
but it gave me an error:
File "<ipython-input-29-2b936e3b990e>", line 1
select substring_index(Address, ',', -1)
^
SyntaxError: invalid syntax
How do I extract only the postcode, in this case B15
?
What is the best way to visualize this data, since its a categorical variable not continuous. I think I should group the postcodes, but do I create a dummy?
Upvotes: 0
Views: 2077
Reputation: 396
If your data are in a pandas DataFrame you can use the function extract() like this :
df["PostCode"] = df["Address"].str.extract(r"([A-Z]\d+)")
The function extract()
uses the regexp to extract the first occurence of a capital letter followed by a one or several numbers representing the postcode you need.
Hope it helps
Upvotes: 1