Reputation: 23
I am trying to import with pandas the tables in this file. However, panda.read_html
gives the No tables found
error. Here is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize
import html5lib
pd.read_html(html_file_path)
I don't understand why it is not working. Thanks.
Upvotes: 2
Views: 139
Reputation: 126
I am getting a different error: invalid literal for int() with base 10: '100%'
. This is due to the fact that the html file uses the 'colspan' attribute with % (according to the html spec for colspan it should be an integer). One can fix this using what is suggested here:
import pandas as pd
from bs4 import BeautifulSoup
with open("protein.html") as fp:
soup = BeautifulSoup(fp, 'html.parser')
all_colspan = soup.find_all(attrs={'colspan':True})
for colspan in all_colspan:
colspan.attrs['colspan'] = colspan.attrs['colspan'].replace('%', '')
df = pd.read_html(str(soup))
Upvotes: 1