Reputation: 75
Consider I have this dataframe, wherein I want to remove toy as a topic from the topics column and if there is row with a single topic as a toy , remove that row. How can we do that in pandas?
+---+-----------------------------------+-------------------------+
| | Comment | Topic |
+---+-----------------------------------+-------------------------+
| 1 | ----- | toy, bottle, vegetable |
| 2 | ----- | fruit, toy, electronics |
| 3 | ----- | toy |
| 4 | ----- | electronics, fruit |
| 5 | ----- | toy, electronic |
+---+-----------------------------------+-------------------------+
Upvotes: 1
Views: 271
Reputation: 1
# create data dummy test data
data = {'comment':[1,2,3,4,5],
'Topic':['toy, bottle, vegetable','fruit, toy, electronics','toy','electronics, fruit','toy, electronic']}
# create dataframe
df = pd.DataFrame(data)
# create function to tidy your data and remove toy
def remove_toy(row):
row = [i.strip() for i in row.split(',')]
row = [i for i in row if i != 'toy']
return ', '.join(row)
# apply function to series
df['Topic'] = df['Topic'].apply(remove_toy)
#remove empy rows in the Topic series
df = df[df['Topic']!='']
Upvotes: 0
Reputation: 79
lambda functions can come handy in such scenarios
df['topic'] = df['topic'].apply(lambda x: "" if len(x.split(','))==1 and x.split(',')[0]=='toy'))
Upvotes: 0
Reputation: 71570
Try using str.replace
with str.rstrip
and ne
inside [...]
:
df['topic'] = df['topic'].str.replace('toy', ' ').str.replace(' , ', '').str.rstrip()
print(df[df['topic'].ne('')])
Upvotes: 1