Reputation: 77
I have a dataframe as follows but larger:
df = {"text": ["it is two degrees warmer", "it is five degrees warmer today", "it was ten degrees warmer and not cooler", "it is ten degrees cooler", "it is too frosty today", "it is a bit icy and frosty today" ]}
allowed_list= ["cooler", "warmer", "frosty", "icy"]
I would like to replace all the words except for the words in the list with 'O', while keeping it comma separated like this:
desired output:
text
0 O,O,O,O,warmer
1 O,O,O,O,warmer,O
2 O,O,O,O,warmer,O,O,cooler
3 O,O,O,O,cooler
4 O,O,O,frosty,O
5 O,O,O,O,icy,O,frosty,O,
what I have done so far is to split the sting rows to list with str.split(' ')
based on white space but not sure how to get rid of the words that are not in the list.
Upvotes: 0
Views: 67
Reputation: 88226
You could use a list comprehension, and join
back setting ,
as a separator. Also by building a set
from allowed_list
we'll have a faster lookup:
allowed_set= set(["cooler","warmer","frosty","icy"])
df['text'] = [','.join([w if w in allowed_set else 'O' for w in s.split()])
for s in df['text']]
print(df)
text
0 O,O,O,O,warmer
1 O,O,O,O,warmer,O
2 O,O,O,O,warmer,O,O,cooler
3 O,O,O,O,cooler
4 O,O,O,frosty,O
5 O,O,O,O,icy,O,frosty,O
Upvotes: 2