Cdhippen
Cdhippen

Reputation: 665

Pandas keeps creating lists instead of DataFrames out of HTML input

I used the requests module to create an HTML object out of the contents of a webpage. I tried to use pandas to read_html on that object, but it just created a giant list. It looks like a dataframe, but the type says list, and I can't call dataframe methods on it.

This is the code I wrote for it after getting the HTML object:

headers = {'User-Agent': ua.google}

tables = pd.read_html(response.content)

This is what it looks like when I call tables:

Table

It looks right, and I can fix the bad data once it's in the form of a dataframe, but I can't figure out how to change it from type list to type dataframe, and I'm also not sure why it went into a list instead of a dataframe in the first place.

As a second note, I tried using BeautifulSoup to read the HTML and extract the table which gave me just the contents of the table, but when I try to read it via pandas, if I try to pd.read_html(str(table)) and then preview the dataframe, I just get the site name and the bottom content in a list, not a dataframe.

Upvotes: 1

Views: 237

Answers (1)

piRSquared
piRSquared

Reputation: 294258

pandas.read_html is returning a list of dataframes.

Try:

 tables[0]

Experimenting with Google's Colaboratory.

Code runs here

Notebook can be found on my github here

Upvotes: 2

Related Questions