SpindriftSeltzer
SpindriftSeltzer

Reputation: 321

Split column based on last digit found

I have a dataframe with a column containing an address and some text after it.

ex:

Address
123 Fake St, Boulder, CO 80304 Attached Dwelling/
345 Main St, Boulder, CO 80304 Vacant Land/Lots
456 Cool Dr, Erie, CO 80516 Attached Dwelling/Building

This is what I'd like to do

Address                               Type
123 Fake St, Boulder, CO 80304        Attached Dwelling/
345 Main St, Boulder CO 80304         Vacant Land/Lots
456 Cool Dr, Erie, Co 80516           Attached Dwelling/Building

I thought this might work, using regex to look for the first digit, but working from right to left. However, I get the error "ValueError: Columns must be same length as key"

df[['Address', 'Type']] = df['Address'].str.rsplit('\d', n=1, expand=True)

Upvotes: 1

Views: 51

Answers (2)

wwnde
wwnde

Reputation: 26676

Please split on the space that has five digits immediately to its left and expand split if you wanted to use split

 df.Address.str.split('(?<=\d{5})\s+', expand=True)


                         0                           1
0  123 Fake St, Boulder, CO 80304          Attached Dwelling/
1  345 Main St, Boulder, CO 80304            Vacant Land/Lots
2     456 Cool Dr, Erie, CO 80516  Attached Dwelling/Building

Upvotes: 1

Daniel Geffen
Daniel Geffen

Reputation: 1862

Apparently there is a known issue of rsplit not working with regex (SO question, open issue).

Upvotes: 1

Related Questions