lulu mirzai
lulu mirzai

Reputation: 77

Pandas: replace all words in a row with a certain value except for words from a list

I have a dataframe as follows but larger:

df = {"text": ["it is two degrees warmer", "it is five degrees warmer today", "it was ten degrees warmer and not cooler", "it is ten degrees cooler", "it is too frosty today", "it is a bit icy and frosty today" ]}

allowed_list= ["cooler", "warmer", "frosty", "icy"]

I would like to replace all the words except for the words in the list with 'O', while keeping it comma separated like this:

 desired output:

 text
 0  O,O,O,O,warmer
 1  O,O,O,O,warmer,O
 2  O,O,O,O,warmer,O,O,cooler
 3  O,O,O,O,cooler
 4  O,O,O,frosty,O
 5  O,O,O,O,icy,O,frosty,O,

what I have done so far is to split the sting rows to list with str.split(' ') based on white space but not sure how to get rid of the words that are not in the list.

Upvotes: 0

Views: 67

Answers (1)

yatu
yatu

Reputation: 88226

You could use a list comprehension, and join back setting , as a separator. Also by building a set from allowed_list we'll have a faster lookup:

allowed_set= set(["cooler","warmer","frosty","icy"])
df['text'] = [','.join([w if w in allowed_set else 'O' for w in s.split()]) 
              for s in df['text']]

print(df)

                        text
0             O,O,O,O,warmer
1           O,O,O,O,warmer,O
2  O,O,O,O,warmer,O,O,cooler
3             O,O,O,O,cooler
4             O,O,O,frosty,O
5     O,O,O,O,icy,O,frosty,O

Upvotes: 2

Related Questions