Reputation: 321
I have a dataframe with a column containing an address and some text after it.
ex:
Address
123 Fake St, Boulder, CO 80304 Attached Dwelling/
345 Main St, Boulder, CO 80304 Vacant Land/Lots
456 Cool Dr, Erie, CO 80516 Attached Dwelling/Building
This is what I'd like to do
Address Type
123 Fake St, Boulder, CO 80304 Attached Dwelling/
345 Main St, Boulder CO 80304 Vacant Land/Lots
456 Cool Dr, Erie, Co 80516 Attached Dwelling/Building
I thought this might work, using regex to look for the first digit, but working from right to left. However, I get the error "ValueError: Columns must be same length as key"
df[['Address', 'Type']] = df['Address'].str.rsplit('\d', n=1, expand=True)
Upvotes: 1
Views: 51
Reputation: 26676
Please split
on the space that has five digits immediately to its left and expand split if you wanted to use split
df.Address.str.split('(?<=\d{5})\s+', expand=True)
0 1
0 123 Fake St, Boulder, CO 80304 Attached Dwelling/
1 345 Main St, Boulder, CO 80304 Vacant Land/Lots
2 456 Cool Dr, Erie, CO 80516 Attached Dwelling/Building
Upvotes: 1
Reputation: 1862
Apparently there is a known issue of rsplit
not working with regex (SO question, open issue).
Upvotes: 1