Man
Man

Reputation: 191

Drop last word in pd.DataFrame

I have a pandas dataframe with a column full of addresses. In general, I have inconsistent formatting and want to make the formatting consistent.

as an example:

df = pd.DataFrame({'apt': [1, 2, 3], 'unit': ['f', 'g', 'a'], 'address: 
['15 E 59th St', '4 4th St', '4 83rd street']})

I am currently formatting my column as so:

df['address'] = df['address'].str.title().str.replace(r"(\d+) 
(st|nd|rd|th)\b", r"\1")

to remove "ordinality" from the street numbers. However, sometimes the address ends in an abbreviation 'st' and sometimes the full word 'street' and I think simply removing the last word of each string would be the easiest way to get around this. How can I do this in a Pythonic way so that I am applying something to the entire column that removes the last word? I've tried rsplit and it won't work the way I have it:

df['address'].str.title().str.replace(r"(\d+)(st|nd|rd|th)\b", 
r"\1").str.rsplit([' ', 1])[0]

Edit: the current answer is very helpful, but I have realized that dropping the street type actually won't work in a few cases where a street and avenue have the same beginning (1 st and 1 ave). Is there a way to make the street endings consistent so that all versions of all street endings are the same?

Upvotes: 2

Views: 4034

Answers (1)

BENY
BENY

Reputation: 323236

Doing with str.rsplit

df.address.str.rsplit(' ',1).str[0]
0    15 E 59th
1        4 4th
2       4 83rd
Name: address, dtype: object
df.address=df.address.str.rsplit(' ',1).str[0]

Upvotes: 10

Related Questions