Chan
Chan

Reputation: 4301

How to output excel files with multiple urls in one cell with pandas?

One of my dataframe element contains text in html format : This is <a href="https://www.google.com">google</a> and this is <a href="https://www.yahoo.com">yahoo</a>

I want to save this dataframe in excel file.

Can the excel file show the string as This is google and this is yahoo with two urls in one cell?

Thanks

Upvotes: 1

Views: 423

Answers (1)

Anwarvic
Anwarvic

Reputation: 12992

You can do something like this:

import re
import pandas as pd

df = pd.DataFrame({"text": ['This is <a href="https://www.google.com">google</a> and this is <a href="https://www.yahoo.com">yahoo</a>']})

df["links"] = df.text.apply(lambda x: re.findall(r'<a href="(.+?)".+?', x))
df.text = df.text.str.replace(r"<a.+?>(.+?)</a>", r'\1', regex=True)
print(df)
#                               text                                            links
#0  This is google and this is yahoo  [https://www.google.com, https://www.yahoo.com]

Upvotes: 1

Related Questions