Reputation: 2853
I have a pandas dataframe with a column sent
, which contains strings. Now if the string contains a word from a given list, it needs to be replaced with a new word, say "new_word". But I am not sureee how eexactly this can be done without iterating through the rows in the Dataframe. Is there an efficient method to do so.
For finding a word in a sstring, where the word belongs to a lissst can be achieved by:
wordList = ["word1","word2","word3","word4"]
filtStr = "\s"+"\s|\s".join(wordList)+"\s"
print(list(df[df["sent"].str.lower().str.contains(filtStr)].index))
Similarly, word replacement can. be done if all I need to search for is a single word
print(list(df[df["sent"].str.lower().str.replace("word1","new_word")))
But I am not usree how exactly the word repalcement can be done, if it is a list of words, withoutiterating through the rows.
Upvotes: 1
Views: 2290
Reputation: 2853
Apparently, replace
works similar to contains
.
The solution is
wordList = ["word1","word2","word3","word4"]
filtStr = "\s"+"\s|\s".join(wordList)+"\s"
print(list(df[df["sent"].str.lower().str.replace(filtStr,"new_word")))
Upvotes: -1
Reputation: 953
Another way of doing this using regular expression is:
import pandas as pd
test_df = pd.DataFrame(columns=["sent"], index=["x", "y", "z", "p"])
test_df.loc['x', 'sent'] = "I'm a superman; word1"
test_df.loc['y', 'sent'] = "I'm a superwoman; word2"
test_df.loc['z', 'sent'] = "I'm a spiderman; word3"
test_df.loc['p', 'sent'] = "I'm a batman; noword"
print(test_df)
wordList = ["word1","word2","word3","word4"]
regx = r'({})'.format('|'.join(wordList))
test_df['sent'] = test_df['sent'].str.replace(regx, "new_word").fillna(test_df['sent'])
Upvotes: 2