wildcat89
wildcat89

Reputation: 1285

Convert "Empty Dataframe" / List Items to Dataframe?

I parsed a table from a website using Selenium (by xpath), then used pd.read_html on the table element, and now I'm left with what looks like a list that makes up the table. It looks like this:

[Empty DataFrame
Columns: [Symbol, Expiration, Strike, Last, Open, High, Low, Change, Volume]
Index: [],        Symbol  Expiration  Strike  Last  Open  High   Low  Change   Volume
0  XPEV Dec20  12/18/2020    46.5  3.40  3.00  5.05  2.49    1.08    696.0
1  XPEV Dec20  12/18/2020    47.0  3.15  3.10  4.80  2.00    1.02   2359.0
2  XPEV Dec20  12/18/2020    47.5  2.80  2.67  4.50  1.89    0.91   2231.0
3  XPEV Dec20  12/18/2020    48.0  2.51  2.50  4.29  1.66    0.85   3887.0
4  XPEV Dec20  12/18/2020    48.5  2.22  2.34  3.80  1.51    0.72   2862.0
5  XPEV Dec20  12/18/2020    49.0  1.84  2.00  3.55  1.34    0.49   4382.0
6  XPEV Dec20  12/18/2020    50.0  1.36  1.76  3.10  1.02    0.30  14578.0
7  XPEV Dec20  12/18/2020    51.0  1.14  1.26  2.62  0.78    0.31   4429.0
8  XPEV Dec20  12/18/2020    52.0  0.85  0.95  2.20  0.62    0.19   2775.0
9  XPEV Dec20  12/18/2020    53.0  0.63  0.79  1.85  0.50    0.13   1542.0]

How do I turn this into an actual dataframe, with the "Symbol, Expiration, etc..." as the header, and the far left column as the index?

I've been trying several different things, but to no avail. Where I left off was trying:

# From reading the html of the table step
dfs = pd.read_html(table.get_attribute('outerHTML'))
dfs = pd.DataFrame(dfs)

... and when I print the new dfs, I get this:

0  Empty DataFrame
Columns: [Symbol, Expiration, ...
1         Symbol  Expiration  Strike  Last  Open ...

Upvotes: 0

Views: 147

Answers (1)

Parfait
Parfait

Reputation: 107687

Per pandas.read_html docs,

This function will always return a list of DataFrame or it will fail, e.g., it will not return an empty list.

According to your list output the non-empty dataframe is the second element in that list. So retrieve it by indexing (remember Python uses zero as first index of iterables). Do note you can use data frames stored in lists or dicts.

dfs[1].head()
dfs[1].tail()
dfs[1].describe()
...

single_df = dfs[1].copy()
del dfs

Or index on same call

single_df = pd.read_html(...)[1]

Upvotes: 1

Related Questions