Achillies
Achillies

Reputation: 11

Removing substrings from a Data Frame

I used the following code

remove_words=['Conference Call - Final.rtf','Conference Call - F(2).rtf','Final(2).rtf']
pat= '|'.join(remove_words)
pat
df['title'] = df['conference_name'].str.replace(pat,'')

but my result was enter image description here my code successfully replaced [Conference Call - Final.rtf] but was not able to replace [Conference Call - F(2).rtf][Final(2).rtf] my desired output should replace all the substrings which are passed.

Upvotes: 0

Views: 50

Answers (2)

kelvt
kelvt

Reputation: 1038

As Charles Duffy mentioned in the comments, parentheses have special meaning in a regular expression (signifies a capturing group), and you're using the str.replace method with its default argument regex=True. The (2) in your pattern hence interferes with the regex search and replace, and you would have to escape these symbols to signify that you're using the parentheses literally, instead of as a capturing group.

Let's do:

remove_words=['Conference Call - Final.rtf','Conference Call - F(2).rtf','Final(2).rtf']
pat = '|'.join(re.escape(w) for w in remove_words)

df['title'] = df['conference_name'].str.replace(pat, '')

Upvotes: 1

Peacepieceonepiece
Peacepieceonepiece

Reputation: 681

You can use re module to delete specific strings such as :

re.sub("{Conference Call - Final.rtf}",'',df['conference_name'][0])

Upvotes: 0

Related Questions