Drop last word in pd.DataFrame

Question

I have a pandas dataframe with a column full of addresses. In general, I have inconsistent formatting and want to make the formatting consistent.

as an example:

df = pd.DataFrame({'apt': [1, 2, 3], 'unit': ['f', 'g', 'a'], 'address: 
['15 E 59th St', '4 4th St', '4 83rd street']})

I am currently formatting my column as so:

df['address'] = df['address'].str.title().str.replace(r"(\d+) 
(st|nd|rd|th)\b", r"\1")

to remove "ordinality" from the street numbers. However, sometimes the address ends in an abbreviation 'st' and sometimes the full word 'street' and I think simply removing the last word of each string would be the easiest way to get around this. How can I do this in a Pythonic way so that I am applying something to the entire column that removes the last word? I've tried rsplit and it won't work the way I have it:

df['address'].str.title().str.replace(r"(\d+)(st|nd|rd|th)\b", 
r"\1").str.rsplit([' ', 1])[0]

Edit: the current answer is very helpful, but I have realized that dropping the street type actually won't work in a few cases where a street and avenue have the same beginning (1 st and 1 ave). Is there a way to make the street endings consistent so that all versions of all street endings are the same?

Drop last word in pd.DataFrame

Answers (1)

Related Questions