waitingkuo
waitingkuo

Reputation: 93754

How to convert a html table into pandas dataframe

pandas provides an useful to_html() to convert the DataFrame into the html table. Is there any useful function to read it back to the DataFrame?

Upvotes: 10

Views: 10370

Answers (2)

waitingkuo
waitingkuo

Reputation: 93754

The read_html utility released in pandas 0.12

Upvotes: 8

elyase
elyase

Reputation: 40963

In the general case it is not possible but if you approximately know the structure of your table you could something like this:

# Create a test df:
>>> df = DataFrame(np.random.rand(4,5), columns = list('abcde'))
>>> df
     a           b           c           d           e
0    0.675006    0.230464    0.386991    0.422778    0.657711
1    0.250519    0.184570    0.470301    0.811388    0.762004
2    0.363777    0.715686    0.272506    0.124069    0.045023
3    0.657702    0.783069    0.473232    0.592722    0.855030

Now parse the html and reconstruct:

from pyquery import PyQuery as pq

d = pq(df.to_html())
columns = d('thead tr').eq(0).text().split()
n_rows = len(d('tbody tr'))
values = np.array(d('tbody tr td').text().split(), dtype=float).reshape(n_rows, len(columns))
>>> DataFrame(values, columns=columns)

     a           b           c           d           e
0    0.675006    0.230464    0.386991    0.422778    0.657711
1    0.250519    0.184570    0.470301    0.811388    0.762004
2    0.363777    0.715686    0.272506    0.124069    0.045023
3    0.657702    0.783069    0.473232    0.592722    0.855030

You could extend it for Multiindex dfs or automatic type detection using eval() if needed.

Upvotes: 3

Related Questions