Reputation: 28738
I have a table like this:
<table>
<tr class="first">
<td class="id">A1</td>
<td class="name">Scooby</td>
<td class="flavor">Chocolate</td>
</tr>
<tr class="third">
<td class="id">C3</td>
<td class="name">Brian</td>
<td class="flavor">Blue</td>
</tr>
</table>
I'm trying to structure this into something more readable for analysis, and JSON seems like a good join, so intend to transform to something like this:
{
"first":{
"id":"A1",
"name":"Scooby",
"flavor":"Chocolate"
},
"third":{
"id":"C3",
"name":"Brian",
"flavor":"Blue"
}
}
I can loop through the rows and cells in the table and construct dictionaries, but I'm wondering if there's something in the bs4 library that will do this for me, or any shortcut.
My looping code would look like this:
result = {}
for row in table.select('tr'):
row_result = {}
for cell in row.select('td'):
row_result[cell['class'][0]] = cell.text
result[row['class'][0]] = row_result
Works, but prefer not to include this is there's a cleaner way.
Upvotes: 2
Views: 58
Reputation: 71471
You can use BeautifulSoup
:
html = """
<table>
<tr class="first">
<td class="id">A1</td>
<td class="name">Scooby</td>
<td class="flavor">Chocolate</td>
</tr>
<tr class="third">
<td class="id">C3</td>
<td class="name">Brian</td>
<td class="flavor">Blue</td>
</tr>
</table>
"""
from bs4 import BeautifulSoup as soup
d = soup(html, 'html.parser')
result = {i['class'][0]:{b['class'][0]:b.text for b in i.find_all('td')} for i in d.find_all('tr')}
Output:
{'first': {'id': 'A1', 'name': 'Scooby', 'flavor': 'Chocolate'}, 'third': {'id': 'C3', 'name': 'Brian', 'flavor': 'Blue'}}
Upvotes: 3