panda.read_html gives "No tables found" error

Question

I am trying to import with pandas the tables in this file. However, panda.read_html gives the No tables found error. Here is my code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize
import html5lib

pd.read_html(html_file_path)

I don't understand why it is not working. Thanks.

Valerio · Accepted Answer

I am getting a different error: invalid literal for int() with base 10: '100%'. This is due to the fact that the html file uses the 'colspan' attribute with % (according to the html spec for colspan it should be an integer). One can fix this using what is suggested here:

import pandas as pd
from bs4 import BeautifulSoup

with open("protein.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')

all_colspan = soup.find_all(attrs={'colspan':True})
for colspan in all_colspan:
    colspan.attrs['colspan'] = colspan.attrs['colspan'].replace('%', '')

df = pd.read_html(str(soup))

panda.read_html gives "No tables found" error

Answers (1)

Related Questions

panda.read_html gives &quot;No tables found&quot; error

Answers (1)

Related Questions

panda.read_html gives "No tables found" error