how to remove certain strings in pandas dataframe

Question

I have a dataframe, df with a column that has different school names, school_name. I want to remove certain words, and wonder what the best way to go about this might be.

For example, I want to remove ‘male’ and ‘female’ from strings like:

‘gps hafiz shahmale p’
‘gpps mogal malep’ 
‘government primary school chak femalep’ 
‘govt girls high school syebadadfemale p’ 
‘ghs male p’
…

There are many other strings besides ‘male’ or ‘female’ that I want to remove that have similar complexities, e.g:

I also want to remove ‘sbcombined’ from strings like:

'government girls high school chak no120sbcombinedp',
'govt boys elementary school chak no119sbcombined t',
'govt boys elementary school chak no 37 sbcombined p'
…

All I could think of now is to write separate functions for each words, e.g. to remove ‘male’:

l = df.school_name.tolist()

for i in l: 
    if (i[-4:]=='male') or (i[-5:-1]=='male' and i[-7:-5]!='fe'):
        i2 = i.replace('male', '')
    df.loc[df.school_name==i, school_name] = i2

Is there a better, more efficient way to go about this?

edit: I also would like to know how I could deal with the complexity involved with the string 'male' - 'male' is part of the string 'female' (which I want to remove as well), that when I use re.search to remove the word 'male', for strings that include the word 'female', the 'male' part of the 'female' word gets removed that only 'fe' is left behind; something which I want to avoid.

Dishin H Goyani · Accepted Answer

Use str.replace

pattern = '|'.join(['male','female'])
df['school_name'] = df.school_name.str.replace(pattern, '')

It will replace all words in list with '' empty string.

how to remove certain strings in pandas dataframe

Answers (2)

Related Questions