Fabiana
Fabiana

Reputation: 23

panda.read_html gives "No tables found" error

I am trying to import with pandas the tables in this file. However, panda.read_html gives the No tables found error. Here is my code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize
import html5lib

pd.read_html(html_file_path)

I don't understand why it is not working. Thanks.

Upvotes: 2

Views: 139

Answers (1)

Valerio
Valerio

Reputation: 126

I am getting a different error: invalid literal for int() with base 10: '100%'. This is due to the fact that the html file uses the 'colspan' attribute with % (according to the html spec for colspan it should be an integer). One can fix this using what is suggested here:

import pandas as pd
from bs4 import BeautifulSoup

with open("protein.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')

all_colspan = soup.find_all(attrs={'colspan':True})
for colspan in all_colspan:
    colspan.attrs['colspan'] = colspan.attrs['colspan'].replace('%', '')

df = pd.read_html(str(soup))

Upvotes: 1

Related Questions