Reputation: 6159
I have the below text:
text = """<table class="table table-striped">\n <thead>\n <tr>\n <th data-field="placement">Placement</th>\n <th data-field="production">Production</th>\n <th data-field="application">Eng.Vol.</th>\n <th data-field="body">Body No</th>\n <th data-field="eng">Eng No</th>\n <th data-field="eng">Notes</th>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW18</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW18 LHD</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW28</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">2.0 L</td>\n <td data-field="body">HRW38 RHD</td>\n <td data-field="eng">R20A9</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n </thead>\n </table>"""
this HTML text is properly closed with table tag, and has all required tags. still pandas is not reading as a table.
code:
pd.read_html(text)
output:
[Empty DataFrame
Columns: [(Placement, Front Stabilizer, Front Stabilizer, Front Stabilizer, Front Stabilizer), (Production, Oct 16~, Oct 16~, Oct 16~, Oct 16~), (Eng.Vol., 1.5 L, 1.5 L, 1.5 L, 2.0 L), (Body No, HRW18, HRW18 LHD, HRW28, HRW38 RHD), (Eng No, L15BY, L15BY, L15BY, R20A9), (Notes, Pos:Left/Right, Pos:Left/Right, Pos:Left/Right, Pos:Left/Right)]
Index: []]```
Upvotes: 1
Views: 309
Reputation: 150735
Your table is wrapped inside <thead></thead>
. It's understandable that pandas interprete everything as the columns. Let's try:
tmp=pd.read_html(text)[0]
pd.DataFrame(tmp.columns.to_frame().values)
Output:
0 1 2 3 4
-- ---------- ---------------- ---------------- ---------------- ----------------
0 Placement Front Stabilizer Front Stabilizer Front Stabilizer Front Stabilizer
1 Production Oct 16~ Oct 16~ Oct 16~ Oct 16~
2 Eng.Vol. 1.5 L 1.5 L 1.5 L 2.0 L
3 Body No HRW18 HRW18 LHD HRW28 HRW38 RHD
4 Eng No L15BY L15BY L15BY R20A9
5 Notes Pos:Left/Right Pos:Left/Right Pos:Left/Right Pos:Left/Right
Upvotes: 1