Reputation: 43
I have a dataframe, df
with a column that has different school names, school_name
. I want to remove certain words, and wonder what the best way to go about this might be.
For example, I want to remove ‘male’
and ‘female’
from strings like:
‘gps hafiz shahmale p’
‘gpps mogal malep’
‘government primary school chak femalep’
‘govt girls high school syebadadfemale p’
‘ghs male p’
…
There are many other strings besides ‘male’
or ‘female’
that I want to remove that have similar complexities, e.g:
I also want to remove ‘sbcombined’
from strings like:
'government girls high school chak no120sbcombinedp',
'govt boys elementary school chak no119sbcombined t',
'govt boys elementary school chak no 37 sbcombined p'
…
All I could think of now is to write separate functions for each words, e.g. to remove ‘male’
:
l = df.school_name.tolist()
for i in l:
if (i[-4:]=='male') or (i[-5:-1]=='male' and i[-7:-5]!='fe'):
i2 = i.replace('male', '')
df.loc[df.school_name==i, school_name] = i2
Is there a better, more efficient way to go about this?
edit: I also would like to know how I could deal with the complexity involved with the string 'male' - 'male' is part of the string 'female' (which I want to remove as well), that when I use re.search to remove the word 'male', for strings that include the word 'female', the 'male' part of the 'female' word gets removed that only 'fe' is left behind; something which I want to avoid.
Upvotes: 0
Views: 96
Reputation: 976
If you can specificy words you want to remove in a list replace_word_list
, try something like:
for word in replace_word_list:
df['school_name'] = df['school_name'].str.replace(word, '')
Upvotes: 0
Reputation: 7723
Use str.replace
pattern = '|'.join(['male','female'])
df['school_name'] = df.school_name.str.replace(pattern, '')
It will replace all words in list with ''
empty string.
Upvotes: 1