Reputation: 4301
One of my dataframe element contains text in html format : This is <a href="https://www.google.com">google</a> and this is <a href="https://www.yahoo.com">yahoo</a>
I want to save this dataframe in excel file.
Can the excel file show the string as This is google and this is yahoo
with two urls in one cell?
Thanks
Upvotes: 1
Views: 423
Reputation: 12992
You can do something like this:
import re
import pandas as pd
df = pd.DataFrame({"text": ['This is <a href="https://www.google.com">google</a> and this is <a href="https://www.yahoo.com">yahoo</a>']})
df["links"] = df.text.apply(lambda x: re.findall(r'<a href="(.+?)".+?', x))
df.text = df.text.str.replace(r"<a.+?>(.+?)</a>", r'\1', regex=True)
print(df)
# text links
#0 This is google and this is yahoo [https://www.google.com, https://www.yahoo.com]
Upvotes: 1