adey27
adey27

Reputation: 469

How to remove special characters from rows in pandas dataframe

I have a column in pandas data frame like the one shown below;

LGA

Alpine (S)
Ararat (RC)
Ballarat (C)
Banyule (C)
Bass Coast (S)
Baw Baw (S)
Bayside (C)
Benalla (RC)
Boroondara (C)

What I want to do, is to remove all the special characters from the ending of each row. ie. (S), (RC).

Desired output should be;

LGA

Alpine
Ararat
Ballarat
Banyule
Bass Coast
Baw Baw
Bayside
Benalla
Boroondara

I am not quite sure how to get desired output mentioned above.

Any help would be appreciated.

Thanks

Upvotes: 6

Views: 2225

Answers (3)

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15558

You can use Pandas str.replace


…
dataf['LGA'] = dataf['LGA'].str.replace(r"\([^()]*\)", "", regex=True)

Demo


import pandas as pd

dataf = pd.DataFrame({
"LGA":\
"""Alpine (S)
Ararat (RC)
Ballarat (C)
Banyule (C)
Bass Coast (S)
Baw Baw (S)
Bayside (C)
Benalla (RC)
Boroondara (C)""".split("\n")
})

output = dataf['LGA'].str.replace(r"\([^()]*\)", "", regex=True)

print(output)
0        Alpine 
1        Ararat 
2      Ballarat 
3       Banyule 
4    Bass Coast 
5       Baw Baw 
6       Bayside 
7       Benalla 
8    Boroondara 
Name: LGA, dtype: object

Upvotes: 1

Gedas Miksenas
Gedas Miksenas

Reputation: 1059

I have different approach using regex. It will delete anything between brackets:

import re
import pandas as pd
df = {'LGA': ['Alpine (S)', 'Ararat (RC)', 'Bass Coast (S)']  }
df = pd.DataFrame(df)
df['LGA'] = [re.sub("[\(\[].*?[\)\]]", "", x).strip() for x in df['LGA']] # delete anything between brackets

Upvotes: 2

Gerrit
Gerrit

Reputation: 26

import pandas as pd
df = {'LGA': ['Alpine (S)', 'Ararat (RC)', 'Bass Coast (S)']  }
df = pd.DataFrame(df)
df[['LGA','throw away']] = df['LGA'].str.split('(',expand=True)

Upvotes: 1

Related Questions