J. Serra
J. Serra

Reputation: 490

Error in reading html to data frame in Python “html5lib not found”

I've come accross the following error about html5lib when trying to read an html data frame.

Here is the code:

!pip install html5lib
!pip install lxml
!pip install beautifulSoup4

import html5lib
import lxml
from bs4 import BeautifulSoup

table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")

This is the error:

ImportError                               Traceback (most recent call last)
<ipython-input-68-e24654a0a301> in <module>()
----> 1 table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na)
    913                   thousands=thousands, attrs=attrs, encoding=encoding,
    914                   decimal=decimal, converters=converters, na_values=na_values,
--> 915                   keep_default_na=keep_default_na)

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parse(flavor, io, match, attrs, encoding, **kwargs)
    737     retained = None
    738     for flav in flavor:
--> 739         parser = _parser_dispatch(flav)
    740         p = parser(io, compiled_match, attrs, encoding)
    741 

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parser_dispatch(flavor)
    680     if flavor in ('bs4', 'html5lib'):
    681         if not _HAS_HTML5LIB:
--> 682             raise ImportError("html5lib not found, please install it")
    683         if not _HAS_BS4:
    684             raise ImportError(

ImportError: html5lib not found, please install it

Any help would be much appreciated. Thanks

Upvotes: 24

Views: 61224

Answers (5)

GTaylor
GTaylor

Reputation: 121

I ran into this off and on for a couple months and wasn't able to keep it failing long enough to troubleshoot. I know the library is loaded because this runs just fine most of the time. I even installed it again with no effect. Today I figured it out.

I was passing a list of HTML files to a function that read the tables into dataframes. The list was one larger than it should have been; the first filename was duplicated with a '~' first character and added to the list. Rather than try find out why the extra file was in the list, I added a filter to the list parser to check for '~' in the string and, if true, skip it. I haven't tested it much yet, but it quit erroring out with the code change.

If anyone knows what caused the extra filename to be created, I'd like to know.

Upvotes: 0

Heider Sati
Heider Sati

Reputation: 2614

For my MacBook I used the following to install:

python3 -m pip install html5lib

I also updated my libs using:

python3.11 -m pip install --upgrade pip

Once done, the problem was solved

Upvotes: 0

I_Might_Remember_This
I_Might_Remember_This

Reputation: 11

I had this exact error show up while trying to read a saved .htm file using Spyder IDE.

This code displayed html5lib error:

import pandas as pd
df = pd.read_html("F:\xxxx\xxxxx\xxxxx\aaaa.htm")

I knew I had html5lib installed and working correctly because I had other scripts that worked.

For whatever reason, file path needed to be a string literal (putting an r in front of the file path).

This code works for me:

import pandas as pd
df = pd.read_html(r"F:\xxxx\xxxxx\xxxxx\aaaa.htm")

Upvotes: 1

Yanofsky
Yanofsky

Reputation: 1856

I ran into this error when I gave the wrong path to the local file I was trying to open. So also be sure that you're pointing to the right place!

Upvotes: 0

Yilun Zhang
Yilun Zhang

Reputation: 9008

If you read the error message, you don't have html5lib installed. Do:

pip install html5lib

in your terminal.


If you are calling from jupyter notebook (just like you did with !), try to restart the kernel in order to have the packages loaded.

Upvotes: 25

Related Questions