Reputation: 490
I've come accross the following error about html5lib when trying to read an html data frame.
Here is the code:
!pip install html5lib
!pip install lxml
!pip install beautifulSoup4
import html5lib
import lxml
from bs4 import BeautifulSoup
table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")
This is the error:
ImportError Traceback (most recent call last)
<ipython-input-68-e24654a0a301> in <module>()
----> 1 table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")
/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na)
913 thousands=thousands, attrs=attrs, encoding=encoding,
914 decimal=decimal, converters=converters, na_values=na_values,
--> 915 keep_default_na=keep_default_na)
/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parse(flavor, io, match, attrs, encoding, **kwargs)
737 retained = None
738 for flav in flavor:
--> 739 parser = _parser_dispatch(flav)
740 p = parser(io, compiled_match, attrs, encoding)
741
/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parser_dispatch(flavor)
680 if flavor in ('bs4', 'html5lib'):
681 if not _HAS_HTML5LIB:
--> 682 raise ImportError("html5lib not found, please install it")
683 if not _HAS_BS4:
684 raise ImportError(
ImportError: html5lib not found, please install it
Any help would be much appreciated. Thanks
Upvotes: 24
Views: 61224
Reputation: 121
I ran into this off and on for a couple months and wasn't able to keep it failing long enough to troubleshoot. I know the library is loaded because this runs just fine most of the time. I even installed it again with no effect. Today I figured it out.
I was passing a list of HTML files to a function that read the tables into dataframes. The list was one larger than it should have been; the first filename was duplicated with a '~' first character and added to the list. Rather than try find out why the extra file was in the list, I added a filter to the list parser to check for '~' in the string and, if true, skip it. I haven't tested it much yet, but it quit erroring out with the code change.
If anyone knows what caused the extra filename to be created, I'd like to know.
Upvotes: 0
Reputation: 2614
For my MacBook I used the following to install:
python3 -m pip install html5lib
I also updated my libs using:
python3.11 -m pip install --upgrade pip
Once done, the problem was solved
Upvotes: 0
Reputation: 11
I had this exact error show up while trying to read a saved .htm file using Spyder IDE.
This code displayed html5lib error:
import pandas as pd
df = pd.read_html("F:\xxxx\xxxxx\xxxxx\aaaa.htm")
I knew I had html5lib installed and working correctly because I had other scripts that worked.
For whatever reason, file path needed to be a string literal (putting an r in front of the file path).
This code works for me:
import pandas as pd
df = pd.read_html(r"F:\xxxx\xxxxx\xxxxx\aaaa.htm")
Upvotes: 1
Reputation: 1856
I ran into this error when I gave the wrong path to the local file I was trying to open. So also be sure that you're pointing to the right place!
Upvotes: 0
Reputation: 9008
If you read the error message, you don't have html5lib
installed. Do:
pip install html5lib
in your terminal.
If you are calling from jupyter notebook (just like you did with !
), try to restart the kernel in order to have the packages loaded.
Upvotes: 25