Kaushik
Kaushik

Reputation: 255

How to remove uppercase characters from the end of a string when they are preceded by a lowercase character?

I'm scraping some data on college basketball teams from ESPN's BPI page (http://www.espn.com/mens-college-basketball/bpi/_/view/resume) to store in a pandas dataframe. When I read the html table into a dataframe, the abbreviated school name is appended to the full school name. E.g I have several strings that looks like this: "North CarolinaUNC".

How can I remove the UNC from the end of the string? I tried the below regex to match characters at the end of strings:

name = "North CarolinaUNC"
name = re.sub(r"\z[A-Z]","", name)

but it won't work for schools whose names are made up of two words. Is there a way to write a rule that removes uppercase characters from a string when they are preceded by a lowercase character?

Upvotes: 1

Views: 622

Answers (1)

Jean-François Fabre
Jean-François Fabre

Reputation: 140256

use $ to match the end of the string, and non-matching lookbehind to check if the uppercase letters come after lowercase letters:

import re
name = "North CarolinaUNC"
name = re.sub(r"(?<=[a-z])[A-Z]+$","", name)

results in North Carolina all right.

And with that expression, "North Carolina UNC" stays unmodified because the uppercase letters, even if at the end of the string, do not come after a lowercase letter.

Upvotes: 1

Related Questions