Reputation: 327
I have created a dataframe column to store hashtags, each row of this column is a list of strings like this:
df.hashtag
0 [#MondayMotivation, #BlackMamba, #RIPMamba, #c...
1 [#Periscope, #HeartGang, #SpreadLuv, #KobeRIP,...
2 [#Periscope, #HeartGang, #SpreadLuv, #KobeRIP,...
3 [#RoomOfMystery, #BuenLunes, #GRAMMYs, #27Ene,...
4 [#Periscope, #HeartGang, #SpreadLuv, #KobeRIP,...
5 [#Periscope, #HeartGang, #SpreadLuv, #KobeRIP,...
I mean, each line of df.hashtag is a list like this:
df.hashtag[0]
['#MondayMotivation',
'#BlackMamba',
'#RIPMamba',
'#coronavirus',
'#love',
'#Califórnia']
As you can see, there are many similar hashtags representing the same meanings, for instance, #COV_19
and #COVID_19
, so I want to replace these elements into the same string #COVID19
so I created a list of these hashtags not in the right format. Like this:
token = ['#Covid_19',
'#covid2019',
'#covid19',
'#covid_19',
'#COVid',
'#COVID__19']
Then I tried the replace method but failed.
df.replace(token,'#COVID-19',inplace=True)
how can I replace these hashtags into the string that I want?
Upvotes: 2
Views: 908
Reputation: 587
Alternatively
for t in token:
df['hashtag'] = df['hashtag'].str.replace(t, '#COVID19')
Another suggestion would be, for such instances of your token list, you might want to clean up your data such as capitalizing all the hashtag, remove special characters and replace year to a fixed format. That way your token list is smaller and your loops are shorter.
Upvotes: 0
Reputation: 8302
Here is a solution, First Series.explode
then create a dict of token as key & "#COVID_19"
as value to replace
finally groupby
to get original back.
(df.hashtag.explode().replace({t : "#COVID_19" for t in token})
.groupby(level=0).apply(list))
Upvotes: 1
Reputation: 10624
You can do the following. Add similar lines if you have more elements to be replaced.
token = ['#Covid_19',
'#covid2019',
'#covid19',
'#covid_19',
'#COVid',
'#COVID__19']
l=list(df.hashtag)
for i in range(len(l)):
l[i]=['#COVID19' if x in token else x for x in l[i]]
df.hashtag=l
Upvotes: 3