Removing trailing numbers inside a column pandas

Question

I have a column with 25k + records containing similar information in the example data frame.

Example data frame:

df = pd.DataFrame({'Address': ['Corner Great North Rd & Pipiwai Rd RD 6 Whangarei 2104',
                               '2305/142 Shakespeare Road Takapuna North Shore 0622',
                              '29 Stilwell Rd Mt Albert Auckland', '10/70 Atkinson Ave Otahuhu Auckland 1062']})

I'm trying to remove the trailing numbers in each record so ideally I would get a column that contains:

Address
--------
Corner Great North Rd & Pipiwai Rd RD 6 Whangarei
2305/142 Shakespeare Road Takapuna North Shore
29 Stilwell Rd Mt Albert Auckland
10/70 Atkinson Ave Otahuhu Auckland

I've tried using regex to remove all the characters at the end of string until it hits the first white space.

pattern = re.sub("(.*\s).*", '\1', str)

df['Address'] = df.str.replace(pattern, '', regex=True)

This throws a TypeError exception which I think is caused from the numbers in the string. However I believe this code could also delete any trailing words deleting information I want to keep.

My question: Is there a regex pattern that could be applied for the entire column?

Tim Biegeleisen · Accepted Answer

Using str.replace:

df['Address'] = df['Address'].str.replace(r'\s+\d+$', '')

The regex pattern \s+\d+$ will match one or more whitespace characters followed by a number trailing to the end of the address.

Removing trailing numbers inside a column pandas

Answers (2)

Related Questions