Reputation: 665
I used the requests module to create an HTML object out of the contents of a webpage. I tried to use pandas to read_html on that object, but it just created a giant list. It looks like a dataframe, but the type says list, and I can't call dataframe methods on it.
This is the code I wrote for it after getting the HTML object:
headers = {'User-Agent': ua.google}
tables = pd.read_html(response.content)
This is what it looks like when I call tables:
It looks right, and I can fix the bad data once it's in the form of a dataframe, but I can't figure out how to change it from type list to type dataframe, and I'm also not sure why it went into a list instead of a dataframe in the first place.
As a second note, I tried using BeautifulSoup to read the HTML and extract the table which gave me just the contents of the table, but when I try to read it via pandas, if I try to pd.read_html(str(table)) and then preview the dataframe, I just get the site name and the bottom content in a list, not a dataframe.
Upvotes: 1
Views: 237
Reputation: 294258
pandas.read_html
is returning a list of dataframes.
Try:
tables[0]
Experimenting with Google's Colaboratory.
Notebook can be found on my github here
Upvotes: 2