Amrith Krishna
Amrith Krishna

Reputation: 2853

Replace a word in a string if it belongs to a list of words in pandas

I have a pandas dataframe with a column sent, which contains strings. Now if the string contains a word from a given list, it needs to be replaced with a new word, say "new_word". But I am not sureee how eexactly this can be done without iterating through the rows in the Dataframe. Is there an efficient method to do so.

For finding a word in a sstring, where the word belongs to a lissst can be achieved by:

wordList = ["word1","word2","word3","word4"]
filtStr = "\s"+"\s|\s".join(wordList)+"\s"
print(list(df[df["sent"].str.lower().str.contains(filtStr)].index))

Similarly, word replacement can. be done if all I need to search for is a single word

print(list(df[df["sent"].str.lower().str.replace("word1","new_word")))

But I am not usree how exactly the word repalcement can be done, if it is a list of words, withoutiterating through the rows.

Upvotes: 1

Views: 2290

Answers (2)

Amrith Krishna
Amrith Krishna

Reputation: 2853

Apparently, replace works similar to contains.

The solution is

wordList = ["word1","word2","word3","word4"]
filtStr = "\s"+"\s|\s".join(wordList)+"\s"
print(list(df[df["sent"].str.lower().str.replace(filtStr,"new_word")))

Upvotes: -1

SSharma
SSharma

Reputation: 953

Another way of doing this using regular expression is:

import pandas as pd

test_df = pd.DataFrame(columns=["sent"], index=["x", "y", "z", "p"])
test_df.loc['x', 'sent'] = "I'm a superman; word1"
test_df.loc['y', 'sent'] = "I'm a superwoman; word2"
test_df.loc['z', 'sent'] = "I'm a spiderman; word3"
test_df.loc['p', 'sent'] = "I'm a batman; noword"
print(test_df)

test dataframe

wordList = ["word1","word2","word3","word4"]
regx = r'({})'.format('|'.join(wordList))
test_df['sent'] = test_df['sent'].str.replace(regx, "new_word").fillna(test_df['sent'])

outout

Upvotes: 2

Related Questions