Kirk Broadhurst
Kirk Broadhurst

Reputation: 28738

Transform HTML table into list of objects or dictionaries

I have a table like this:

<table>
   <tr class="first">
      <td class="id">A1</td>
      <td class="name">Scooby</td>
      <td class="flavor">Chocolate</td>
   </tr>
   <tr class="third">
      <td class="id">C3</td>
      <td class="name">Brian</td>
      <td class="flavor">Blue</td>
   </tr>
</table>

I'm trying to structure this into something more readable for analysis, and JSON seems like a good join, so intend to transform to something like this:

{
   "first":{
      "id":"A1",
      "name":"Scooby",
      "flavor":"Chocolate"
   },
   "third":{
      "id":"C3",
      "name":"Brian",
      "flavor":"Blue"
   }
}

I can loop through the rows and cells in the table and construct dictionaries, but I'm wondering if there's something in the bs4 library that will do this for me, or any shortcut.

My looping code would look like this:

result = {}
for row in table.select('tr'):
    row_result = {}
    for cell in row.select('td'):
        row_result[cell['class'][0]] = cell.text
    result[row['class'][0]] = row_result

Works, but prefer not to include this is there's a cleaner way.

Upvotes: 2

Views: 58

Answers (1)

Ajax1234
Ajax1234

Reputation: 71471

You can use BeautifulSoup:

html = """
<table>
  <tr class="first">
    <td class="id">A1</td>
    <td class="name">Scooby</td>
    <td class="flavor">Chocolate</td>
  </tr>
  <tr class="third">
    <td class="id">C3</td>
    <td class="name">Brian</td>
    <td class="flavor">Blue</td>
  </tr>
 </table>
 """
from bs4 import BeautifulSoup as soup
d = soup(html, 'html.parser')
result = {i['class'][0]:{b['class'][0]:b.text for b in i.find_all('td')} for i in d.find_all('tr')}

Output:

{'first': {'id': 'A1', 'name': 'Scooby', 'flavor': 'Chocolate'}, 'third': {'id': 'C3', 'name': 'Brian', 'flavor': 'Blue'}}

Upvotes: 3

Related Questions