Reputation: 114
Good day, is it possible to remove everything that's inside including the squared brackets? Thanks in advance
df = pd.DataFrame({'City': ['Santiago [1]','Madrid [2]','Barcelona [2]']})
df
City
0 Santiago [1]
1 Madrid [2]
2 Barcelona [2]
Desired output:
City
0 Santiago
1 Madrid
2 Barcelona
Upvotes: 0
Views: 201
Reputation: 323266
Use split
+ strip
df.City=df.City.str.split('[').str[0].str.strip()
df
City
0 Santiago
1 Madrid
2 Barcelona
Upvotes: 2
Reputation: 19947
This should work with oen or more [xxx] appearing at anywhere in your string.
df.City.str.split('\[.*\]').str.join('')
Upvotes: 0
Reputation: 153
YOBEN_S's answer is perfect. I am just adding an alternative where you don't have to use strip()
by just using split()
which splits the string by the white space in between.
df.City=df.City.str.split().str[0]
df
City
0 Santiago
1 Madrid
2 Barcelona
EDIT : As Nick commented, this wouldn't work with cities containing white spaces in between. Here's an alternative if you want to separate using white space
df = pd.DataFrame({'City': ['Santiago [1]','Madrid [2]','Barcelona [2]','New York [2]','India and China [10]']})
df.City=df.City.apply(lambda x : " ".join(x.split()[:-1]))
df
City
0 Santiago
1 Madrid
2 Barcelona
3 New York
4 India and China
Upvotes: 1
Reputation: 28644
Regex could work here as well ... get all characters before the [
:
df['extract'] = df.City.str.extract(r'(.*(?=\[))')
City extract
0 Santiago [1] Santiago
1 Madrid [2] Madrid
2 Barcelona [2] Barcelona
Upvotes: 0