Delete HTML code and leave only the content? (html2text error)

Question

I have a csv file from scraped data that is in HTML format for prices. I would like to only keep the number and the euro sign, and I am trying to use html2text to do this. (If you have a better alternative, please say so!). One cell in the csv looks like this for example:

I thought about using unescape from html2text but I am getting an import error for unescape. This is the code I would use:

import pandas as pd
import html2text
from html2text import unescape 

df = pd.read_csv('filename.csv')

print(df.head())

df.Price = df.Price.apply(unescape, unicode_snob=True)

but it gives me the error:

ImportError: cannot import name 'unescape' from 'html2text' (/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/html2text/__init__.py)

Delete HTML code and leave only the content? (html2text error)

Answers (1)

Related Questions