Reputation: 23
I've got a column in dataframe I want to clean up by removing the brackets.
1 Auburn (Auburn University)[1]
2 Florence (University of North Alabama)
3 Jacksonville (Jacksonville State University)[2]
4 Livingston (University of West Alabama)[2]
5 Montevallo (University of Montevallo)[2]
6 Troy (Troy University)[2]
7 Tuscaloosa (University of Alabama, Stillman Co...
8 Tuskegee (Tuskegee University)[5]
10 Fairbanks (University of Alaska Fairbanks)[2]
12 Flagstaff (Northern Arizona University)[6]
I used unitowns['City'].str.replace('\(.*\)','').str.replace('\[.*\]','')
to get the intended result as follows-
1 Auburn
2 Florence
3 Jacksonville
4 Livingston
5 Montevallo
6 Troy
7 Tuscaloosa
8 Tuskegee
10 Fairbanks
12 Flagstaff
Is there a way to combine these expressions? This code does not seem to work -> unitowns['City'].str.replace('(\(.*\)) | (\[.*\])','')
Upvotes: 2
Views: 2237
Reputation: 402473
Option 1
str.extract
/str.findall
Rather than removing irrelevant content, why not extract the relevant ones instead?
df.City.str.extract(r'(.*?)(?=\()', expand=False)
Or,
df.City.str.findall(r'(.*?)(?=\()').str[0]
0 Auburn
1 Florence
2 Jacksonville
3 Livingston
4 Montevallo
5 Troy
6 Tuscaloosa
7 Tuskegee
8 Fairbanks
9 Flagstaff
Name: City, dtype: object
You may also want to get rid of leading/trailing spaces post extraction. You can call str.strip
on the result -
df.City = df.City.str.extract(r'(.*?)(?=\()', expand=False).str.strip()
Or,
df.City = df.City.str.findall(r'(.*?)(?=\()').str[0].str.strip()
Regex Details
( # capture group
.*? # non-greedy matcher
)
(?= # lookahead
\( # opening parenthesis
)
Option 2
str.split
If your city names only consist of one word, str.split
would also work.
df.City.str.split('\s', 1).str[0]
0 Auburn
1 Florence
2 Jacksonville
3 Livingston
4 Montevallo
5 Troy
6 Tuscaloosa
7 Tuskegee
8 Fairbanks
9 Flagstaff
Name: City, dtype: object
Option 3
str.replace
Condensing your chained calls, you can use -
df['City'].str.replace(r'\(.*?\)|\[.*?\]', '').str.strip()
0 Auburn
1 Florence
2 Jacksonville
3 Livingston
4 Montevallo
5 Troy
6 Tuscaloosa
7 Tuskegee
8 Fairbanks
9 Flagstaff
Name: City, dtype: object
Upvotes: 5