Helix Herry
Helix Herry

Reputation: 327

how can I replace elements in a pandas column by a list of strings

I have created a dataframe column to store hashtags, each row of this column is a list of strings like this:

df.hashtag

0        [#MondayMotivation, #BlackMamba, #RIPMamba, #c...
1        [#Periscope, #HeartGang, #SpreadLuv, #KobeRIP,...
2        [#Periscope, #HeartGang, #SpreadLuv, #KobeRIP,...
3        [#RoomOfMystery, #BuenLunes, #GRAMMYs, #27Ene,...
4        [#Periscope, #HeartGang, #SpreadLuv, #KobeRIP,...
5        [#Periscope, #HeartGang, #SpreadLuv, #KobeRIP,...

I mean, each line of df.hashtag is a list like this:

df.hashtag[0]

['#MondayMotivation',
 '#BlackMamba',
 '#RIPMamba',
 '#coronavirus',
 '#love',
 '#Califórnia']

As you can see, there are many similar hashtags representing the same meanings, for instance, #COV_19

and #COVID_19, so I want to replace these elements into the same string #COVID19

so I created a list of these hashtags not in the right format. Like this:

token = ['#Covid_19',
 '#covid2019',
 '#covid19',
 '#covid_19',
 '#COVid',
 '#COVID__19']

Then I tried the replace method but failed.

df.replace(token,'#COVID-19',inplace=True)

how can I replace these hashtags into the string that I want?

Upvotes: 2

Views: 908

Answers (3)

lrh09
lrh09

Reputation: 587

Alternatively

for t in token:
    df['hashtag'] = df['hashtag'].str.replace(t, '#COVID19')

Another suggestion would be, for such instances of your token list, you might want to clean up your data such as capitalizing all the hashtag, remove special characters and replace year to a fixed format. That way your token list is smaller and your loops are shorter.

Upvotes: 0

sushanth
sushanth

Reputation: 8302

Here is a solution, First Series.explode then create a dict of token as key & "#COVID_19" as value to replace finally groupby to get original back.

(df.hashtag.explode().replace({t : "#COVID_19" for t in token})
        .groupby(level=0).apply(list))

Upvotes: 1

IoaTzimas
IoaTzimas

Reputation: 10624

You can do the following. Add similar lines if you have more elements to be replaced.

token = ['#Covid_19',
 '#covid2019',
 '#covid19',
 '#covid_19',
 '#COVid',
 '#COVID__19']

l=list(df.hashtag)
for i in range(len(l)):
    l[i]=['#COVID19' if x in token else x for x in l[i]]

df.hashtag=l

Upvotes: 3

Related Questions