Reputation: 15
I have one dataframe with address data structured like this:
tbm_a['address']
Rue de blabla 20
Vossenstraat 7
Rue Père Jean 3 boite Z
Rue XSZFEFEF 331
I would like to split it in one column with the street and one with the house number.
I tried with this for loop, but failed:
import re
s = list(zip(tbm_a['address']))
for addr in s:
tbm_a['street'] = re.findall('[^\d]*', addr)[0].strip()
tbm_a['num'] = str(addr[len(street):].strip().split(' '))
Then I tried with this, and got the number:
tbm_a['address_num'] = tbm_a['address'].str.extract('(?P<number>\d+)', expand=True)
But I couldn't manage to get the street name. Any suggestions?
Upvotes: 0
Views: 900
Reputation: 150785
From your data, you can do:
df.address.str.extract('(?P<Street>\D+) (?P<Number>\d+.*)')
Output:
Street Number
0 Rue de blabla 20
1 Vossenstraat 7
2 Rue Père Jean 3 boite Z
3 Rue XSZFEFEF 331
Remember this will fail if you have number in your street name, e.g. 5th avenue
.
Upvotes: 1